Your First API Call

Your First API Call

Let's extract data from a PDF document using the DocDigitizer API.

Prerequisites

Before making your first API call, ensure you have:

  1. API key - customer portal if you don't have them yet
  2. A PDF document to process

Quick Start

Step 1: Health Check

First, verify the API is available:

curl https://api.docdigitizer.com/v3/docingester

Response:

I am alive

Step 2: Process a Document

Upload a PDF for processing:

curl --request POST \
     --url https://api.docdigitizer.com/v3/docingester/extract \
     --header 'X-API-Key: your-api-key-here' \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --form files='@invoice.pdf'

Step 3: Receive Results

The API returns extracted data immediately:

{
  "stateText": "COMPLETED",
  "traceId": "ABC1234",
  "pipeline": "MainPipelineWithOCR",
  "numberPages": 2,
  "output": {
    "extractions": [
      {
        "schemaName": "Invoice",
        "confidence": 0.95,
        "pages": [
          1,
          2
        ],
        "extraction": {
          "invoiceNumber": "INV-2024-001",
          "invoiceDate": "2024-01-15",
          "vendorName": "Acme Corp",
          "vendorNIF": "123456789",
          "totalAmount": 1250.00,
          "currency": "EUR",
          "lineItems": [
            {
              "description": "Product A",
              "quantity": 2,
              "unitPrice": 500.00,
              "total": 1000.00
            }
          ]
        }
      }
    ]
  },
  "DocumentId": "f8b969cd-fbf5-4048-8d66-ea90283d4eb9",
  "Timestamp": "2026-05-04T14:44:43.3512947Z",
  "timers": {
    "DocIngester": {
      "Total": 2345.67
    }
  }
}

Understanding the Response

stateText

Meaning
COMPLETED: Document processed successfully
ERROR: Processing failed - check messages
NO_SCHEMA: No schema available or detected for extraction

traceId

A unique 7-character identifier for this request. Provide this when contacting support.

output.extractions

Array of extracted documents. A multi-page PDF might contain multiple documents (e.g., several invoices).

Each extraction includes:

  • schemaName: Detected schema for document extraction (Invoice, Receipt, etc.)
  • confidence: Classification confidence (0-1)
  • pages: Which pages this extraction covers
  • extraction: The actual extracted data, accordingly with the schema

Timers

Processing time breakdown in milliseconds. Useful for performance monitoring.

Code Examples

Python

import requests
import uuid

API_KEY = "your-api-key-here"
BASE_URL = "https://api.docdigitizer.com/v3/docingester/extract"

# Generate unique IDs
document_id = str(uuid.uuid4())
context_id = str(uuid.uuid4())

# Upload document
with open("invoice.pdf", "rb") as f:
    response = requests.post(
        BASE_URL,
        headers={"X-API-Key": API_KEY},
        files={"files": f}
    )

result = response.json()

if result["StateText"] == "COMPLETED":
    for extraction in result["output"]["extractions"]:
        print(f"Document Type: {extraction['schemaName']}")
        print(f"Confidence: {extraction['confidence']}")
        print(f"Extracted Data: {extraction['extraction']}")
else:
    print(f"Error: {result['messages']}")

JavaScript

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const { v4: uuidv4 } = require('uuid');

const API_KEY = 'your-api-key-here';
const BASE_URL = 'https://api.docdigitizer.com/v3/docingester/extract';

async function processDocument(filePath) {
  const formData = new FormData();
  formData.append('files', fs.createReadStream(filePath));

  const response = await axios.post(BASE_URL, formData, {
    headers: {
      'X-API-Key': API_KEY,
      ...formData.getHeaders()
    }
  });

  const result = response.data;

  if (result.StateText === 'COMPLETED') {
    result.output.extractions.forEach(extraction => {
      console.log('Document Type:', extraction.schemaName);
      console.log('Confidence:', extraction.confidence);
      console.log('Extracted Data:', extraction.extraction);
    });
  } else {
    console.error('Error:', result.messages);
  }
}

processDocument('invoice.pdf');

C#

using System.Net.Http;
using System.Text.Json;

var apiKey = "your-api-key-here";
var baseUrl = "https://api.docdigitizer.com/v3/docingester/extract";

using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-API-Key", apiKey);

using var content = new MultipartFormDataContent();
content.Add(new StreamContent(File.OpenRead("invoice.pdf")), "files", "invoice.pdf");

var response = await client.PostAsync(baseUrl, content);
var json = await response.Content.ReadAsStringAsync();
var result = JsonSerializer.Deserialize<JsonElement>(json);

if (result.GetProperty("StateText").GetString() == "COMPLETED")
{
    var extractions = result.GetProperty("output").GetProperty("extractions");
    foreach (var extraction in extractions.EnumerateArray())
    {
        Console.WriteLine($"Document Type: {extraction.GetProperty("schemaName")}");
        Console.WriteLine($"Confidence: {extraction.GetProperty("confidence")}");
    }
}

Next Steps