DocDigitizer API
This API is intended for use by developers and external systems
It makes available the necessary integration mechanisms that allow for the integration of DocDigitizer services.
First steps:
- Get an API KEY from DocDigitizer (www.docdigitizer.com) or use the default one already configured on this page
- Use your API key to authenticate on this documentation page (Authorize button)
- View and Try the different endpoints below
Things you should know
HTTP-Based, RESTful, with SSL
We attempt to follow the principles of Representational State Transfer (REST), This means DocDigitizerAPI does not store 'state' nor 'sessions'. Most of the endpoints use JSON data format for responses and, for most of the cases, requests.
We leverage the verbosity of the HTTP protocol. Methods that retrieve data require a GET request, methods that send it might require a PUT or a POST.
All communication with the API should be made over SSL (this is extremelly important)
Your data schema might not be my data schema
Just picture this: you read all of our docs, go through all the examples. Try it out, like what you see, but we're not dealing with the specific scenario for your business need. Are we not a good fit? Of course we are! :)
We focused our documentation and examples on an invoices and accounting scenario, since we feel that can be an area easy enough to grasp for both tech and non-techs alike. But at the moment we deal with clients in many sectors with very different and specific data needs. Even though we don't support the creation of data schemas through our API (yet) that is something we can do to target our app's needs. Make sure to drop us a line if you think you'd need some help with setting up a different data schema.
Authentication
DocDigitizer API uses API Key authentication for all the endpoints that need to be authenticated.
This is a specific schema you'll need to send along in the header for all the requests sent by your
app (our api doesn't store 'state' nor 'sessions' in our side).
To authenticate your requests, you'll just need to set the "Authentication" request header. Something like so:
File / Upload Limitations
We currently support only one file submitted by call, if more than one file is included the process will keep going with the first file found and skipped the remaining files. The submitted file must have a size limit of 25Mb for the upload, and support the following media types:
- application/pdf
- image/jpeg
- image/png
- image/tiff
In case you don't build your request with the media type information, we'll try to infer it from the file's name extension.
Date time
Our date time values are given in Coordinated Universal Time or UTC
Request Status
When you do a request you should receive one of the following status information, as you note some of them are quite standard but some others have special meanings that you could use in your own integration flow with DocDigitizer API
Status Code | Message Code | Message Description |
---|---|---|
200 | OK | None |
201 | CREATED | None/Depend of use case |
202 | ACCEPTED | None/Depend of use case |
400 | BAD_REQUEST | None/Depend of use case |
401 | WRONG_AUTH | Authorization refused. No such user |
403 | NOT_ENOUGH_PERMISSIONS | Access refused due to not enough permissions |
403 | NO_LICENCE | No license for the user's organization |
403 | EXPIRED_LICENCE | License expired |
404 | NOT_FOUND | None/Depend of use case |
404 | INVALID_OUTPUT_FORMAT | None/Depend of use case |
409 | CONFLICT | None/Depend of use case |
415 | FILE_TYPE_NOT_SUPPORTED | Unsupported Media type |
422 | INCOHERENT_DATA | Special case of invalid content |
500 | UNEXPECTED_ERROR | None/Depend of use case |
501 | NOT_IMPLEMENTED | This feature is not implemented |
Our Resources
These are the most relevant resources in DocDigitizer Api.
Documents
It all starts on a document. These are image or pdf files you'll want to send to the api to be annotated. You can later get their most recent Annotations. Documents have a class (which give it then a target data schema for the data extractions to fill in).
Tasks
When you send your documents to the DocDigitizer API, we'll then queue it to the DocDigitizer Engine service. Task also informs you about the Document ID (the identifier that you need to get information about your file) and SLA Moment (the maximum expected processing lead time).
Annotations
Annotations are the results from the DocDigitizer Engine for a document, these are machine generated and follow one of the data schemas accordingly to the document's class. When Document is 'reviewed' you can make sure that our team of Data Curators also reviewed this data.
Smart Linking
You might have noticed by now that some of our responses send along a "links" node that might occur more than once in the
response body.
Each of the links inside the "links" node provides a way of referencing other related resources within our api.
For instance, when you retrieve the annotations for a specific document.
The links node tells you:
- annotations, the endpoint for the annotations of that same document
- assets, the endpoint to get the original assets for the document
- collection, the endpoint to get the list of other documents associated to your organization
- reviews, the endpoint to get the reviews for your that same document
- self, the endpoint for the document itself
Response pagination
The endpoints that return lists are usually paginated. This impacts both the way you need to send your requests but also the data and metadata we send you on the responses.
KEY | REQ/RESP | WHERE |
---|---|---|
X-More-Results-Available | RESPONSE | HEADER |
X-Cursor-ID | RESPONSE | HEADER |
X-Cursor-Was-Broken | RESPONSE | HEADER |
maxRetrieve | REQUEST | QUERY |
cursorId | REQUEST | QUERY |
X-More-Results-Available
We'll send this on the response headers. It lets you know wether or not there are more results to fetch. This applies to both:
- there are so few results, that pagination is not even needed
- (OR) you reached the last page of results
X-Cursor-ID
We'll send this on the response headers. This represents the position from where to start for the next page. In order to get it, you'll need to do the same request again (same endpoint, same parameters) but change the cursorId query parameter. On this second response, you will get a new X-Cursor-ID
X-Cursor-Was-Broken
We'll send this on the response headers. You should worry with X-Cursor-ID other than making sure to send it back to us in order to get a subsequent page of data. If for some reason the value you send to us is invalid or corrupt, we'll set X-Cursor-Was-Broken to true. If all is fine, false
maxRetrieve
You should send this as a parameter in the request query string. We'll apply pagination with a default page size (we specify each endpoint's default value), but you can change the page size by setting an integer value to maxRetrieve.
cursorId
You should send this as a parameter in the request query string. If you just got a response, and the X-More-Results-Available header is true, you should do another request exactly the same as the previous one but set the cursorId query parameter to whichever value you just got in X-Cursor-ID whenever you are ready to fetch the next page.