API Integration and First Steps

Are the API results immediate?

No. The default API service is asynchronous. This is because DocDigitizer ensures (near) 100% accuracy over the results. This means that, unlike OCRs, we don’t require our customers to validate the data manually. Our output is verified and trusted data.

How much time does it take to get the results?

Depends on your subscription. When you choose your DocDigitizer subscription, you choose your maximum processing lead time. All documents will be processed below that processing lead time.

Example: If your subscription is set to a max of 8 business hours, all data will be available up to 8 business hours after document submission.

How can I know when the data results are ready?

DocDigitizer provides two main integration methods for getting the data results:

  • PULL - In this method, the client will query DocDigitizer's API from time to time with a document ID and check whether the results are available or not. The time window between queries should always be set according to the subscription processing lead time SLA.
  • PUSH - In this method, when submitting the document, the client will send a callback URL; this URL will be called with the document ID and its state when the document has data results available. (more information about this integration can be found here).

How do I set up a new document layout on my subscription?

DocDigitizer provides out-of-the-box support to any document layout. There is no need to define layout templated or configurations.

Let's say that my document varies greatly in terms of layout. Do I need to train DocDigitizer with a dataset?

No. No training or specific configurations is needed. DocDigitizer is a managed software as a service. Our team does all the training, model improvement, and model setup internally with no customer intervention. For our customers, we provide 100% accurate data from document 1 with No-Code.

Is the JSON API result always in the same format?

Yes. The responses will always be sent in the same format. The only part that varies is the data part of the response, which depends on the data of each document.

Can I have a FREE account to develop my integration?

No. DocDigitizer does not provide FREE subscription. Nevertheless, we have standard subscriptions starting at a few € per month that can be subscribed for integration and testing over standard use cases. Visit our pricing section.

Can I get only the automatic results without human validation?

Yes. But only for particular use-cases where the information is very structured and quality is stable. This scenario requires developing an assessment project so that an accurate study may be undertaken and presented to the customer. A hybrid model is also available, where the customer receives both the automatic and the curated data for the same documents.

How much time does it take to setup DocDigitizer?

A few hours. The setup of DocDigitizer is defining the expected data schemas (set of document types and their fields) and the generations of an API key. The setup process is done mainly by DocDigitizer’s team, following our customer's requirements. After this setup process, an API key will be available and ready to process documents.

Does the quality of the document impact DocDigitizer’s performance?

Yes. We strongly mitigate the impact of the document quality by having a human-in-the-loop that will act as a quality barrier and ensure that our customer always receives 100% correct data. Nevertheless, the document may be rejected due to low quality if the human cannot read it.

Are all documents reviewed by humans?

No. Our AI-assisted Human Revision Platform will decide upon each data point if it makes sense for our human data curators to review it. The decision is based on hundreds of features collected throughout the process, such as Machine Learning Confidence, Similarity, and Performance within similar content, Response Time SLA, Document Quality… Based on those features, that document may require no revision or one or more revisions.

Do I know which documents are reviewed by humans?

No. DocDigitizer is responsible for managing its level of services and process optimization as a managed service. This is done internally in our operation and is not visible to our customers. Our customers will always receive 100% accurate data independent of how the process was internally done.

How does DocDigitizer ensure 100% accuracy?

Unlike traditional data capture solutions, DocDigitizer not only offers state-of-the-art machine data extraction but bundles it with an expertly designed human in the loop process, where DocDigitizer data validation team acts as a quality firewall ensuring (near) 100% accuracy in any layout, language, format or domain (for more details, please check our Terms of Service).
By harnessing the power of AI working in harmony with a built-in human-in-the-loop, we unburden our customers from the data validation process and enable the data outputted by DocDigitizer may be directly streamlined to a software without any human validation on the customer-end.

Is DocDigitizer able to detect duplicated documents?

No. For privacy and data security reasons, DocDigitizer does not store any document data or documents files, so we don't keep any historical data on our end apart from transaction anonymized information (for logging purposes). Therefore, it's not possible to compare information between documents or detect duplicates using DocDigitizer's API.

We suggest our customers implement the duplication analysis on their end by both comparing the returned entity schema content with their historical records (to detect content duplicates) or using the document MD5 hash (to detect file duplicates).

What kind of alphabets are supported by DocDigitizer?

Latin. DocDigitizer supports all languages based on the Latin alphabet. The most popular languages processed by DocDigitizer customers are English, French, German, Nordic languages (Danish, Norwegian, Swedish, Finnish), Spanish, Italian, Czech, Slovak, Polish, Hungarian and Romanian.

Does DocDigitizer provides 95%, 99%, 99.5%, 99.99%, or 100% accuracy?

Accuracy is an outcome that comes from a cross-check between a result coming from a machine and verification of a human-in-the-loop. Even if you cross-check results between multiple humans (as we do when needed), you can guess that you will still have an underlying error because humans also fail.
DocDigitizer's commitment is to provide results (near) 100% accuracy, data with a high level of quality far above what the customer can get by deploying their own human-in-the-loop, thus enabling our customers to skip any human validation because adding their human-in-the-loop would not bring any added value to the process only costs.
In most DocDigitizer subscriptions, that translates into a +99% accuracy per field, and in some very quality demanding processes +99.5% accuracy per field.

Why am I receiving empty fields (without any value) or with "None"?

An empty field is a field with content equal to “” (i.e empty string) or with the keyword None.

There are two main reasons for empty fields:

a) The original document don't have that information on it's content, or it's illegible;

b) It was a human error while curating the document (considered in the subscribed accuracy)

If there is an error in my data results, what should I do?

If you identify any error in your data results, you should report the issue, see how.

Do you process PDFs with more than 1 document?

Multi-document PDFs support is only available on DocDigitizer V2 API on Enterprise Plans.