Things you should know

API Endpoints and Environments

DocDigitizer V2 API has 2 environments available:

Testing
- Auth https://authpprod.docdigitizer.com/
- API gateway https://assetgwpprod.docdigitizer.com/
Production
- Auth https://auth.docdigitizer.com/
- API gateway https://assetgw.docdigitizer.com/

If you already have your API Key, on each method specification you can also do a test choosing the "Base URL" accordingly.

HTTP-Based, RESTful, with SSL

We attempt to follow the principles of Representational State Transfer (REST), This means DocDigitizer PowerCapture API does not store 'state' nor 'sessions'. Most of the endpoints use JSON data format for responses and requests.

We leverage the verbosity of the HTTP protocol. Methods that retrieve data require a GET request, methods that send it might require a PUT or a POST.

All communication with the API should be made over SSL this is extremely important.

By design:
We do not accept any insecure HTTP communication nor inbound nor outbound
We do not accept cryptographic protocols already known as vulnerables or known as using weak ciphers. Specifically, all versions of SSL (Secure Sockets Layer) and TLS (Transport Layer Security) versions 1.0, 1.1.
We do not accept weak ciphers on TLS version 1.2. To be more specifically we only accept the following ciphers:
Enabled features
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
We plan to decommission TLS 1.2 and only accept connections using TLS version 1.3, but unfortunately we couldn’t take yet the step to only accept TLS version 1.3 (or above) due to some legacy present in some of our customers and even on a few cloud services that we use, sorry for that.

File / Upload Limitations

We currently support only one file submitted by call, if more than one file is included the process will keep going with the first file found and skipped the remaining files. The submitted file must have a size limit of 25Mb for the upload, and support the following media types:

application/pdf
image/jpeg
image/png
image/tiff

In case you don't build your request with the media type information, we'll try to infer it from the file's name extension.

Confidence

As mentioned on the Guidelines, the Document Classification and Data Extraction tasks can be done by Humans and Machine Learning.

It is possible to configure that some filed or document classification are only done by Machine Learning.
In these cases, it is important to take in account the value of confidence return to see the accuracy that the Machine Learning have on extracting certain Field (e.g. Social Security Number) or Classifying a certain Type of Document (e.g. Citizen Card)

"features": {
	"5fc854bd-9608-4012-beec-16c52947d4ef": {
		"uuid": "5fc854bd-9608-4012-beec-16c52947d4ef",
		"name": "atm_receipt",
		"label": "ATM Receipt",
		"createdAt": "2021-12-09T16:02:34.173",
		"origin": 20,
		"originText": "MachineLearning",
		"confidence": 0.9975,
		"fields": {
			"66e356c3-df7f-4eb1-add9-b682fa9e2bd7": {
				"uuid": "66e356c3-df7f-4eb1-add9-b682fa9e2bd7",
				"name": "bank_card_last_digits",
				"label": "Últimos 4 dígitos de multibanco/cartão",
				"dataType": 10,
				"dataTypeText": "String",
				"createdAt": "2021-12-09T16:02:34.173",
				"confidence": 0.9975,
				"fieldValue": "6758",
				"origin": 20,
				"originText": "MachineLearning",
				"annotationInformation": null
			},
        ...

DocDigitizer PowerCapture Machine Learning automate the training to the goal of generating different machine learning models to better obtain the highest degree of confidence (accuracy of data extracted).

Better for better understating how DocDigitizer PowerCapture achieve the level 100% accuracy due to AI/ML + Human in the Loop, please check see Get Started with DocDigitizer PowerCapture