Your First API Call

Let's extract data from a PDF document using the DocDigitizer Sync API.

Prerequisites

Before making your first API call, ensure you have:

API credentials - Request access if you don't have them yet
A PDF document to process

Quick Start

Step 1: Health Check

First, verify the API is available:

curl https://apix.docdigitizer.com/sync

Response:

I am alive

Step 2: Process a Document

Upload a PDF for processing:

curl -X POST https://apix.docdigitizer.com/sync \
  -H "X-API-Key: your-api-key-here" \
  -F "[email protected]" \
  -F "id=$(uuidgen)" \
  -F "contextID=$(uuidgen)"

Step 3: Receive Results

The API returns extracted data immediately:

{
  "StateText": "COMPLETED",
  "TraceId": "ABC1234",
  "Pipeline": "MainPipelineWithOCR",
  "NumberPages": 2,
  "Output": {
    "extractions": [
      {
        "documentType": "Invoice",
        "confidence": 0.95,
        "countryCode": "PT",
        "pageRange": {
          "start": 1,
          "end": 2
        },
        "extraction": {
          "invoiceNumber": "INV-2024-001",
          "invoiceDate": "2024-01-15",
          "vendorName": "Acme Corp",
          "vendorNIF": "123456789",
          "totalAmount": 1250.00,
          "currency": "EUR",
          "lineItems": [
            {
              "description": "Product A",
              "quantity": 2,
              "unitPrice": 500.00,
              "total": 1000.00
            }
          ]
        }
      }
    ]
  },
  "Timers": {
    "DocIngester": {
      "total": 2345.67
    }
  }
}

Request Parameters

Parameter	Required	Description
`files`	Yes	The PDF file to process
`id`	Yes	Unique document ID (UUID) - use for tracking
`contextID`	Yes	Context ID (UUID) - group related documents
`pipelineIdentifier`	No	Specific pipeline to use
`requestToken`	No	Custom trace token

Understanding the Response

StateText

Value	Meaning
`COMPLETED`	Document processed successfully
`ERROR`	Processing failed - check Messages

TraceId

A unique 7-character identifier for this request. Provide this when contacting support.

Output.extractions

Array of extracted documents. A multi-page PDF might contain multiple documents (e.g., several invoices).

Each extraction includes:

documentType: Detected document type (Invoice, Receipt, etc.)
confidence: Classification confidence (0-1)
countryCode: Detected country
pageRange: Which pages this extraction covers
extraction: The actual extracted fields

Timers

Processing time breakdown in milliseconds. Useful for performance monitoring.

Code Examples

Python

import requests
import uuid

API_KEY = "your-api-key-here"
BASE_URL = "https://apix.docdigitizer.com/sync"

# Generate unique IDs
document_id = str(uuid.uuid4())
context_id = str(uuid.uuid4())

# Upload document
with open("invoice.pdf", "rb") as f:
    response = requests.post(
        BASE_URL,
        headers={"X-API-Key": API_KEY},
        files={"files": f},
        data={
            "id": document_id,
            "contextID": context_id
        }
    )

result = response.json()

if result["StateText"] == "COMPLETED":
    for extraction in result["Output"]["extractions"]:
        print(f"Document Type: {extraction['documentType']}")
        print(f"Confidence: {extraction['confidence']}")
        print(f"Extracted Data: {extraction['extraction']}")
else:
    print(f"Error: {result['Messages']}")

JavaScript

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const { v4: uuidv4 } = require('uuid');

const API_KEY = 'your-api-key-here';
const BASE_URL = 'https://apix.docdigitizer.com/sync';

async function processDocument(filePath) {
  const formData = new FormData();
  formData.append('files', fs.createReadStream(filePath));
  formData.append('id', uuidv4());
  formData.append('contextID', uuidv4());

  const response = await axios.post(BASE_URL, formData, {
    headers: {
      'X-API-Key': API_KEY,
      ...formData.getHeaders()
    }
  });

  const result = response.data;

  if (result.StateText === 'COMPLETED') {
    result.Output.extractions.forEach(extraction => {
      console.log('Document Type:', extraction.documentType);
      console.log('Confidence:', extraction.confidence);
      console.log('Extracted Data:', extraction.extraction);
    });
  } else {
    console.error('Error:', result.Messages);
  }
}

processDocument('invoice.pdf');

C#

using System.Net.Http;
using System.Text.Json;

var apiKey = "your-api-key-here";
var baseUrl = "https://apix.docdigitizer.com/sync";

using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-API-Key", apiKey);

using var content = new MultipartFormDataContent();
content.Add(new StreamContent(File.OpenRead("invoice.pdf")), "files", "invoice.pdf");
content.Add(new StringContent(Guid.NewGuid().ToString()), "id");
content.Add(new StringContent(Guid.NewGuid().ToString()), "contextID");

var response = await client.PostAsync(baseUrl, content);
var json = await response.Content.ReadAsStringAsync();
var result = JsonSerializer.Deserialize<JsonElement>(json);

if (result.GetProperty("StateText").GetString() == "COMPLETED")
{
    var extractions = result.GetProperty("Output").GetProperty("extractions");
    foreach (var extraction in extractions.EnumerateArray())
    {
        Console.WriteLine($"Document Type: {extraction.GetProperty("documentType")}");
        Console.WriteLine($"Confidence: {extraction.GetProperty("confidence")}");
    }
}