DocDigitizer API
This API is intended for use by developers and external systems
It makes available the necessary integration mechanisms that allow for the integration of DocDigitizer services.

First steps:

  • Get an API KEY from DocDigitizer (www.docdigitizer.com) or use the default one already configured on this page
  • Use your API key to authenticate on this documentation page (Authorize button)
  • View and Try the different endpoints below
184184

Things you should know

HTTP-Based, RESTful, with SSL
We attempt to follow the principles of Representational State Transfer (REST), This means DocDigitizerAPI does not store 'state' nor 'sessions'. Most of the endpoints use JSON data format for responses and, for most of the cases, requests.

We leverage the verbosity of the HTTP protocol. Methods that retrieve data require a GET request, methods that send it might require a PUT or a POST.

All communication with the API should be made over SSL (this is extremelly important)

Your data schema might not be my data schema
Just picture this: you read all of our docs, go through all the examples. Try it out, like what you see, but we're not dealing with the specific scenario for your business need. Are we not a good fit? Of course we are! :)

We focused our documentation and examples on an invoices and accounting scenario, since we feel that can be an area easy enough to grasp for both tech and non-techs alike. But at the moment we deal with clients in many sectors with very different and specific data needs. Even though we don't support the creation of data schemas through our API (yet) that is something we can do to target our app's needs. Make sure to drop us a line if you think you'd need some help with setting up a different data schema.

Authentication
DocDigitizer API uses API Key authentication for all the endpoints that need to be authenticated.
This is a specific schema you'll need to send along in the header for all the requests sent by your
app (our api doesn't store 'state' nor 'sessions' in our side).

To authenticate your requests, you'll just need to set the "Authentication" request header. Something like so:

624624

We currently have a file size limit of 25Mb for the upload and support the following media types:

  • application/pdf
  • image/jpeg
  • image/png
  • image/tiff

In case you don't build your request with the media type information, we'll try to infer it from the file's name extension.

Request Status

When you do a request you should receive one of the following status information, as you note some of them are quite standard but some others have special meanings that you could use in your own integration flow with DocDigitizer API

Status Code

Message Code

Message Description

200

OK

None

201

CREATED

None/Depend of use case

202

ACCEPTED

None/Depend of use case

400

BAD_REQUEST

None/Depend of use case

401

WRONG_AUTH

Authorization refused. No such user

403

NOT_ENOUGH_PERMISSIONS

Access refused due to not enough permissions

403

NO_LICENCE

No license for the user's organization

403

EXPIRED_LICENCE

License expired

404

NOT_FOUND

None/Depend of use case

404

INVALID_OUTPUT_FORMAT

None/Depend of use case

409

CONFLICT

None/Depend of use case

415

FILE_TYPE_NOT_SUPPORTED

Unsupported Media type

422

INCOHERENT_DATA

Special case of invalid content

500

UNEXPECTED_ERROR

None/Depend of use case

501

NOT_IMPLEMENTED

This feature is not implemented

Our Resources

These are the most relevant resources in DocDigitizer Api.

Documents
It all starts on a document. These are image or pdf files you'll want to send to the api to be annotated. You can later get a list of documents you annotated, get their most recent annotations or their original assets. Documents have a class (which give it then a target data schema for the data extractions to fill in).

Tasks
When you send your documents to the DocDigitizer API, we'll then queue it to the DocDigitizer Engine service. You can get some information about this task and poll the api for the tasks status before checking for the documents result.

Extractions
Extractions are the results from the DocDigitizer Engine for a document. These are machine generated and follow whatever data schema is set for the document's class.

Reviews
If you so choose, you can have your documents being reviewed by our team of Data Curators. The Reviews endpoints present the merged data of extractions and reviews and also follow whatever data schema is set for a document's class.

Smart Linking

You might have noticed by now that some of our responses send along a "links" node that might occur more than once in the
response body.

624624

Each of the links inside the "links" node provides a way of referencing other related resources within our api.

For instance, when you retrieve the annotations for a specific document.

624624

The links node tells you:

  • annotations, the endpoint for the annotations of that same document
  • assets, the endpoint to get the original assets for the document
  • collection, the endpoint to get the list of other documents associated to your organization
  • reviews, the endpoint to get the reviews for your that same document
  • self, the endpoint for the document itself

Response pagination

The endpoints that return lists are usually paginated. This impacts both the way you need to send your requests but also the data and metadata we send you on the responses.

KEY

REQ/RESP

WHERE

X-More-Results-Available

RESPONSE

HEADER

X-Cursor-ID

RESPONSE

HEADER

X-Cursor-Was-Broken

RESPONSE

HEADER

maxRetrieve

REQUEST

QUERY

cursorId

REQUEST

QUERY

X-More-Results-Available
We'll send this on the response headers. It lets you know wether or not there are more results to fetch. This applies to both:

  • there are so few results, that pagination is not even needed
  • (OR) you reached the last page of results

X-Cursor-ID
We'll send this on the response headers. This represents the position from where to start for the next page. In order to get it, you'll need to do the same request again (same endpoint, same parameters) but change the cursorId query parameter. On this second response, you will get a new X-Cursor-ID

X-Cursor-Was-Broken
We'll send this on the response headers. You should worry with X-Cursor-ID other than making sure to send it back to us in order to get a subsequent page of data. If for some reason the value you send to us is invalid or corrupt, we'll set X-Cursor-Was-Broken to true. If all is fine, false

maxRetrieve
You should send this as a parameter in the request query string. We'll apply pagination with a default page size (we specify each endpoint's default value), but you can change the page size by setting an integer value to maxRetrieve.

cursorId
You should send this as a parameter in the request query string. If you just got a response, and the X-More-Results-Available header is true, you should do another request exactly the same as the previous one but set the cursorId query parameter to whichever value you just got in X-Cursor-ID whenever you are ready to fetch the next page.