No. The default API service is asynchronous. This is because DocDigitizer ensures (near) 100% accuracy over the results. This means that, unlike OCRs, we don’t require our customers to validate the data manually. Our output is verified and trusted data.
Depends on your subscription. When you choose your DocDigitizer subscription, you choose your maximum processing lead time. All documents will be processed below that processing lead time.
Example: If your subscription is set to a max of 8 business hours, all data will be available up to 8 business hours after document submission.
DocDigitizer provides two main integration methods for getting the data results:
- PULL - In this method, the client will query DocDigitizer's API from time to time with a document ID and check whether the results are available or not. The time window between queries should always be set according to the subscription processing lead time SLA.
- PUSH - In this method, when submitting the document, the client will send a callback URL; this URL will be called with the document ID and its state when the document has data results available. (more information about this integration can be found here).
DocDigitizer provides out-of-the-box support to any document layout. There is no need to define layout templated or configurations.
Let's say that my document varies greatly in terms of layout. Do I need to train DocDigitizer with a dataset?
No. No training or specific configurations is needed. DocDigitizer is a managed software as a service. Our team does all the training, model improvement, and model setup internally with no customer intervention. For our customers, we provide 100% accurate data from document 1 with No-Code.
Yes. The responses will always be sent in the same format. The only part that varies is the data part of the response, which depends on the data of each document.
No. DocDigitizer does not provide FREE subscription. Nevertheless, we have standard subscriptions starting at a few € per month that can be subscribed for integration and testing over standard use cases. Visit our pricing section.
Yes. But only for particular use-cases where the information is very structured and quality is stable. This scenario requires developing an assessment project so that an accurate study may be undertaken and presented to the customer. A hybrid model is also available, where the customer receives both the automatic and the curated data for the same documents.
A few hours. The setup of DocDigitizer is defining the expected data schemas (set of document types and their fields) and the generations of an API key. The setup process is done mainly by DocDigitizer’s team, following our customer's requirements. After this setup process, an API key will be available and ready to process documents.
Yes. We strongly mitigate the impact of the document quality by having a human-in-the-loop that will act as a quality barrier and ensure that our customer always receives 100% correct data. Nevertheless, the document may be rejected due to low quality if the human cannot read it.
No. Our AI-assisted Human Revision Platform will decide upon each data point if it makes sense for our human data curators to review it. The decision is based on hundreds of features collected throughout the process, such as Machine Learning Confidence, Similarity, and Performance within similar content, Response Time SLA, Document Quality… Based on those features, that document may require no revision or one or more revisions.
No. DocDigitizer is responsible for managing its level of services and process optimization as a managed service. This is done internally in our operation and is not visible to our customers. Our customers will always receive 100% accurate data independent of how the process was internally done.
Unlike traditional data capture solutions, DocDigitizer not only offers state-of-the-art machine data extraction but bundles it with an expertly designed human in the loop process, where DocDigitizer data validation team acts as a quality firewall ensuring (near) 100% accuracy in any layout, language, format or domain (for more details, please check our Terms of Service).
By harnessing the power of AI working in harmony with a built-in human-in-the-loop, we unburden our customers from the data validation process and enable the data outputted by DocDigitizer may be directly streamlined to a software without any human validation on the customer-end.
No. For privacy and data security reasons, DocDigitizer does not store any document data or documents files, so we don't keep any historical data on our end apart from transaction anonymized information (for logging purposes). Therefore, it's not possible to compare information between documents or detect duplicates using DocDigitizer's API.
We suggest our customers implement the duplication analysis on their end by both comparing the returned entity schema content with their historical records (to detect content duplicates) or using the document MD5 hash (to detect file duplicates).
Latin. DocDigitizer supports all languages based on the Latin alphabet. The most popular languages processed by DocDigitizer customers are English, French, German, Nordic languages (Danish, Norwegian, Swedish, Finnish), Spanish, Italian, Czech, Slovak, Polish, Hungarian and Romanian.
Accuracy is an outcome that comes from a cross-check between a result coming from a machine and verification of a human-in-the-loop. Even if you cross-check results between multiple humans (as we do when needed), you can guess that you will still have an underlying error because humans also fail.
DocDigitizer's commitment is to provide results (near) 100% accuracy, data with a high level of quality far above what the customer can get by deploying their own human-in-the-loop, thus enabling our customers to skip any human validation because adding their human-in-the-loop would not bring any added value to the process only costs.
In most DocDigitizer subscriptions, that translates into a +99% accuracy per field, and in some very quality demanding processes +99.5% accuracy per field.
An empty field is a field with content equal to “” (i.e empty string) or with the keyword None.
There are two main reasons for empty fields:
a) The original document don't have that information on it's content, or it's illegible;
b) It was a human error while curating the document (considered in the subscribed accuracy)
If you identify any error in your data results, you should report the issue, see how.
Multi-document PDFs support is only available on DocDigitizer V2 API on Enterprise Plans.
Updated about 1 year ago