Process a document for data extraction

Upload a PDF or image document for processing. The API performs OCR, classifies the document type, and extracts structured data based on the detected document type.

Use this when you need to:

  • Extract data from invoices (vendor, amounts, line items)
  • Process receipts for expense tracking
  • Extract information from contracts
  • Digitize any PDF or image document into structured data

Processing Pipeline

The document goes through multiple processing stages:

  1. Validation: PDF format and structure verification
  2. OCR: Text extraction using Google Cloud Vision
  3. Classification: AI-powered document type detection
  4. Extraction: Field extraction based on document type schema

Request Headers

HeaderDescriptionRequired
X-API-KeyAPI key for authenticationYes
X-DD-TraceIdYour own trace ID for request tracking. If not provided, a UUID is generated per call.No

Response Headers

The response includes timing headers for performance monitoring:

  • X-DD-TraceId: Unique request identifier for support
  • X-DD-DocumentId: System-generated document UUID
  • X-DD-NumberPages: Number of pages in the document
  • X-DD-Timer-*: Processing time breakdown in milliseconds
Recent Requests
Log in to see full request history
TimeStatusUser Agent
Retrieving recent requests…
LoadingLoading…
Body Params
file
required

The document file to process. Supported formats: PDF, JPEG, PNG, TIFF, BMP, WebP, GIF. The file is validated both by extension and by magic bytes (content verification). Maximum file size: 30 MB.

Headers
string

Your own trace ID for this request. If you provide one, the same value is echoed back in the X-DD-TraceId response header and included in the response body as traceId. If not provided, a new UUID is generated automatically for each call.

Responses

Language
Credentials
Header
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json