HTTP-Based, RESTful, with SSL

We attempt to follow the principles of Representational State Transfer (REST), This means DocDigitizer PowerCapture API does not store 'state' nor 'sessions'. Most of the endpoints use JSON data format for responses and requests.

We leverage the verbosity of the HTTP protocol. Methods that retrieve data require a GET request, methods that send it might require a PUT or a POST.

All communication with the API should be made over SSL this is extremely important.

By design:
We do not accept any insecure HTTP communication nor inbound nor outbound
We do not accept cryptographic protocols already known as vulnerables or known as using weak ciphers. Specifically, all versions of SSL (Secure Sockets Layer) and TLS (Transport Layer Security) versions 1.0, 1.1.
We do not accept weak ciphers on TLS version 1.2. To be more specifically we only accept the following ciphers:
Enabled features
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
We plan to decommission TLS 1.2 and only accept connections using TLS version 1.3, but unfortunately we couldn’t take yet the step to only accept TLS version 1.3 (or above) due to some legacy present in some of our customers and even on a few cloud services that we use, sorry for that.

Confidence

As mentioned on the Guidelines, the Document Classification and Data Extraction tasks can be done by Humans and Machine Learning.

It is possible to configure that some filed or document classification are only done by Machine Learning.
In these cases, it is important to take in account the value of confidence return to see the accuracy that the Machine Learning have on extracting certain Field (e.g. Social Security Number) or Classifying a certain Type of Document (e.g. Citizen Card)

"features": {
    "5fc854bd-9608-4012-beec-16c52947d4ef": {
        "uuid": "5fc854bd-9608-4012-beec-16c52947d4ef",
        "name": "atm_receipt",
        "label": "ATM Receipt",
        "createdAt": "2021-12-09T16:02:34.173",
        "origin": 20,
        "originText": "MachineLearning",
        "confidence": 0.9975,
        "fields": {
            "66e356c3-df7f-4eb1-add9-b682fa9e2bd7": {
                "uuid": "66e356c3-df7f-4eb1-add9-b682fa9e2bd7",
                "name": "bank_card_last_digits",
                "label": "Últimos 4 dígitos de multibanco/cartão",
                "dataType": 10,
                "dataTypeText": "String",
                "createdAt": "2021-12-09T16:02:34.173",
                "confidence": 0.9975,
                "fieldValue": "6758",
                "origin": 20,
                "originText": "MachineLearning",
                "annotationInformation": null
            },
        ...

DocDigitizer PowerCapture Machine Learning automate the training to the goal of generating different machine learning models to better obtain the highest degree of confidence (accuracy of data extracted).

Better for better understating how DocDigitizer PowerCapture achieve the level 100% accuracy due to AI/ML + Human in the Loop, please check see Get Started with DocDigitizer PowerCapture