post
https://api.docdigitizer.com/v3/docingester/extract
Upload a PDF or image document for processing. The API performs OCR, classifies the document type, and extracts structured data based on the detected document type.
Use this when you need to:
- Extract data from invoices (vendor, amounts, line items)
- Process receipts for expense tracking
- Extract information from contracts
- Digitize any PDF or image document into structured data
Processing Pipeline
The document goes through multiple processing stages:
- Validation: PDF format and structure verification
- OCR: Text extraction using Google Cloud Vision
- Classification: AI-powered document type detection
- Extraction: Field extraction based on document type schema
Request Headers
| Header | Description | Required |
|---|---|---|
| X-API-Key | API key for authentication | Yes |
| X-DD-TraceId | Your own trace ID for request tracking. If not provided, a UUID is generated per call. | No |
Response Headers
The response includes timing headers for performance monitoring:
X-DD-TraceId: Unique request identifier for supportX-DD-DocumentId: System-generated document UUIDX-DD-NumberPages: Number of pages in the documentX-DD-Timer-*: Processing time breakdown in milliseconds
