Process a document for data extraction

Upload a PDF or image document for processing. The API performs OCR, classifies the document type, and extracts structured data based on the detected document type.

Use this when you need to:

  • Extract data from invoices (vendor, amounts, line items)
  • Process receipts for expense tracking
  • Extract information from contracts
  • Digitize any PDF or image document into structured data

Processing Pipeline

The document goes through multiple processing stages:

  1. Validation: PDF format and structure verification
  2. OCR: Text extraction using Google Cloud Vision
  3. Classification: AI-powered document type detection
  4. Extraction: Field extraction based on document type schema

Request Headers

HeaderDescriptionRequired
X-API-KeyAPI key for authenticationYes
X-DD-TraceIdYour own trace ID for request tracking. If not provided, a UUID is generated per call.No

Response Headers

The response includes timing headers for performance monitoring:

  • X-DD-TraceId: Unique request identifier for support
  • X-DD-DocumentId: System-generated document UUID
  • X-DD-NumberPages: Number of pages in the document
  • X-DD-Timer-*: Processing time breakdown in milliseconds
Language
Credentials
Header
Response
Click Try It! to start a request and see the response here!