Frequently Asked Questions

What everyone asks us.

Does DocDigitizer provide results in real-time?

No. DocDigitizer is an asynchronous API. All documents have a processing lead time and, once done, results can either be pushed to a callback or queried via API. The maximum lead time will be determined alongside our customer when creating its license. Currently, we have lead times ranging from under 1 minute to under 16 business hours, depending on the use case.

Can I get only the automatic results without human validation?

Yes. But only for particular use-cases where the information is very structured, and quality is stable. This scenario requires developing an assessment project so that an accuracy study may be undertaken and presented to the customer. There is also a hybrid model available, where the customer receives both the automatic and the curated data for the same documents.

How much time does it take to setup DocDigitizer?

A few hours. The setup of DocDigitizer is defining the expected data schemas (set of document types and their fields) and the generations of an API key. The setup process is done mainly by DocDigitizer’s team, following our customer's requirements. After this setup process, an API key will be available and ready to process documents.

If I have a new document, do I need to run any setup?

Yes. You need to contact your customer success manager and provide him with the new data schema. The new data schema will be added to your license in a few hours.

Let say that my document varies a lot in terms of layout. Do I need to train DocDigitizer with a dataset?

No. No training or specific configurations is needed. DocDigitizer is a managed software as a service. All the training, model improvement, and model setup, is done internally by our team with no customer intervention. For our customers, we provide 100% accurate data from document 1 with No-Code.

Does DocDigitizer process Handwritten Documents?

Yes. We provide support to data capture over handwritten documents.

Does the quality of the document impact DocDigitizer’s performance?

Yes. We strongly mitigate the impact of the document quality by having a human-in-the-loop that will act as a quality barrier, ensure that our customer always receives 100% correct data. Nevertheless, if the human cannot read it, the document may be rejected due to low quality.

Are all documents reviewed by humans?

No. Our AI-assisted Human Revision Platform will decide upon each data point if it makes sense for our human data curators to review it. The decision is based on hundreds of features collected throughout the process, such as: Machine Learning Confidence, Similarity, and Performance within similar content, Response Time SLA, Document Quality… Based on those features, that document may require no revision or one or more revisions.

Do I know which documents are reviewed by humans?

No. As a managed service, DocDigitizer is responsible for managing its level of services and process optimization. This is done internally to our operation and not visible to our customers. Our customers will always receive 100% accurate data independent of how the process was internally done.

Is it secure to have humans looking at my documents?

Yes. DocDigitizer’s data curation team follows strict privacy and security guidelines. DocDigitizer has in place GDPR and Data Handling contract that offers our customers a compliance framework completely aligned with the most demanding requirements.

What kind of alphabets are supported by DocDigitizer?

Latin. DocDigitizer supports all languages based on the Latin alphabet. The most popular languages processed by DocDigitizer customers are English, French, German, Nordic languages (Danish, Norwegian, Swedish, Finnish), Spanish, Italian, Czech, Slovak, Polish, Hungarian and Romanian.

Can DocDigitizer be implemented on-premises?

Yes. We have available our community edition that may be deployed on-premises. This version is available for large inbounds (+250k documents per month). Nevertheless, to ensure up-to-date and widely scalable security, maintenance, and regular updates, we do recommend our cloud-based version available on a shared or private tenant.

Where are your servers located?

European Union. DocDigitizer runs on Google and Microsoft data centers within the European Union. Enterprise customers can have DocDigitizer deployed on a different cloud data storage in a different country.

What is the price for a DocDigitizer subscription?

Before we can give you a price estimate, we need to understand your requirements. In addition to your estimated annual document volume, we need to know other information, including which data fields you want to extract and your requirements regarding data processing lead time.

If you are interested in getting a quote from us, please fill in and send us this form, and one of our experts will get back to you promptly. Free Trial is available for free.

Is DocDigitizer GDPR-compliant?

Yes. We are fully committed to ensuring compliance with GDPR. We process documents provided by customers for the primary purpose of data capture, based on the instructions of our customers, and always for a limited time period.

DocDigitizer is a document processing and data extraction service, not a document or data storage service. Documents and captured data will be automatically removed within 48 hours after data extraction.

You can read more about this in our terms and conditions.

What types of document does DocDigitizer support?

All. DocDigitizer can extract data from any human-readable content, independent of layout, domain, type, do you can use DocDigitizer to process any document, email, or photo.

For DocDigitizer to successfully capture data from any document, two conditions apply:

  • Content must be in Latin characters,
  • An Entity Schema must be defined and added to your DocDigitizer License.

Is DocDigitizer able to detect duplicated documents?

No. For privacy and data security reasons, DocDigitizer does not store any document data or documents files, so we don't keep any data historical on our end apart from transaction anonymized information (for logging purposes). Therefore, it's not possible to compare information between documents or detect duplicates using DocDigitizer's API.

We suggest our customers implement the duplication analysis on their end by both comparing the returned entity schema content with their historical records (to detect content duplicates) or use the document MD5 hash (to detected files duplicates).

How does DocDigitizer ensures 100% accuracy?

Unlike traditional data capture solutions, DocDigitizer not only offers state of the art machine data extraction but bundles it with an expertly design human in the loop process, where DocDigitizer data validation team act as a quality firewall ensuring (near) 100% accuracy in any layout, language, format or domain (for more details please check our Terms of Service).
By harnessing the power of AI working in harmony with a built-in human-in-the-loop we unburden our customers from the data validation process and enable that the data outputted by DocDigitizer may be directly streamlined to a software without any human validation on the customer-end.