Process a PDF document

Upload a PDF document for processing. The API performs OCR, classifies the document type, and extracts structured data based on the detected document type.

Use this when you need to:

  • Extract data from invoices (vendor, amounts, line items)
  • Process receipts for expense tracking
  • Extract information from contracts
  • Digitize any PDF document into structured data

Supported Document Types

  • Invoice: Commercial invoices with line items, amounts, vendor info
  • Receipt: Point-of-sale receipts with totals and items
  • Contract: Legal agreements with parties and terms
  • CV/Resume: Employment history and skills
  • ID Document: Identity documents (passports, licenses)
  • Bank Statement: Financial records and transactions

Processing Pipeline

The document goes through multiple processing stages:

  1. Validation: PDF format and structure verification
  2. OCR: Text extraction using Google Cloud Vision
  3. Classification: AI-powered document type detection
  4. Extraction: Field extraction based on document type schema

Response Headers

The response includes timing headers for performance monitoring:

  • X-DD-TraceId: Unique request identifier for support
  • X-DD-NumberPages: Number of pages in the document
  • X-DD-Timer-*: Processing time breakdown in milliseconds
Language
Credentials
Header
Click Try It! to start a request and see the response here!