Process a document for data extraction

post

https://api.docdigitizer.com/v3/docingester/extract

Upload a PDF or image document for processing. The API performs OCR, classifies the document type, and extracts structured data based on the detected document type.

Use this when you need to:

Extract data from invoices (vendor, amounts, line items)
Process receipts for expense tracking
Extract information from contracts
Digitize any PDF or image document into structured data

Processing Pipeline

The document goes through multiple processing stages:

Validation: PDF format and structure verification
OCR: Text extraction using Google Cloud Vision
Classification: AI-powered document type detection
Extraction: Field extraction based on document type schema

Request Headers

Header	Description	Required
X-API-Key	API key for authentication	Yes
X-DD-TraceId	Your own trace ID for request tracking. If not provided, a UUID is generated per call.	No

Response Headers

The response includes timing headers for performance monitoring:

X-DD-TraceId: Unique request identifier for support
X-DD-DocumentId: System-generated document UUID
X-DD-NumberPages: Number of pages in the document
X-DD-Timer-*: Processing time breakdown in milliseconds