Your First API Call
Your First API Call
Let's extract data from a PDF document using the DocDigitizer API.
Prerequisites
Before making your first API call, ensure you have:
- API key - customer portal if you don't have them yet
- A PDF document to process
Quick Start
Step 1: Health Check
First, verify the API is available:
curl https://api.docdigitizer.com/v3/docingesterResponse:
I am alive
Step 2: Process a Document
Upload a PDF for processing:
curl --request POST \
--url https://api.docdigitizer.com/v3/docingester/extract \
--header 'X-API-Key: your-api-key-here' \
--header 'accept: application/json' \
--header 'content-type: multipart/form-data' \
--form files='@invoice.pdf'Step 3: Receive Results
The API returns extracted data immediately:
{
"stateText": "COMPLETED",
"traceId": "ABC1234",
"pipeline": "MainPipelineWithOCR",
"numberPages": 2,
"output": {
"extractions": [
{
"schemaName": "Invoice",
"confidence": 0.95,
"pages": [
1,
2
],
"extraction": {
"invoiceNumber": "INV-2024-001",
"invoiceDate": "2024-01-15",
"vendorName": "Acme Corp",
"vendorNIF": "123456789",
"totalAmount": 1250.00,
"currency": "EUR",
"lineItems": [
{
"description": "Product A",
"quantity": 2,
"unitPrice": 500.00,
"total": 1000.00
}
]
}
}
]
},
"DocumentId": "f8b969cd-fbf5-4048-8d66-ea90283d4eb9",
"Timestamp": "2026-05-04T14:44:43.3512947Z",
"timers": {
"DocIngester": {
"Total": 2345.67
}
}
}Understanding the Response
stateText
| Meaning |
|---|
| COMPLETED: Document processed successfully |
| ERROR: Processing failed - check messages |
| NO_SCHEMA: No schema available or detected for extraction |
traceId
A unique 7-character identifier for this request. Provide this when contacting support.
output.extractions
Array of extracted documents. A multi-page PDF might contain multiple documents (e.g., several invoices).
Each extraction includes:
schemaName: Detected schema for document extraction (Invoice, Receipt, etc.)confidence: Classification confidence (0-1)pages: Which pages this extraction coversextraction: The actual extracted data, accordingly with the schema
Timers
Processing time breakdown in milliseconds. Useful for performance monitoring.
Code Examples
Python
import requests
import uuid
API_KEY = "your-api-key-here"
BASE_URL = "https://api.docdigitizer.com/v3/docingester/extract"
# Generate unique IDs
document_id = str(uuid.uuid4())
context_id = str(uuid.uuid4())
# Upload document
with open("invoice.pdf", "rb") as f:
response = requests.post(
BASE_URL,
headers={"X-API-Key": API_KEY},
files={"files": f}
)
result = response.json()
if result["StateText"] == "COMPLETED":
for extraction in result["output"]["extractions"]:
print(f"Document Type: {extraction['schemaName']}")
print(f"Confidence: {extraction['confidence']}")
print(f"Extracted Data: {extraction['extraction']}")
else:
print(f"Error: {result['messages']}")JavaScript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const { v4: uuidv4 } = require('uuid');
const API_KEY = 'your-api-key-here';
const BASE_URL = 'https://api.docdigitizer.com/v3/docingester/extract';
async function processDocument(filePath) {
const formData = new FormData();
formData.append('files', fs.createReadStream(filePath));
const response = await axios.post(BASE_URL, formData, {
headers: {
'X-API-Key': API_KEY,
...formData.getHeaders()
}
});
const result = response.data;
if (result.StateText === 'COMPLETED') {
result.output.extractions.forEach(extraction => {
console.log('Document Type:', extraction.schemaName);
console.log('Confidence:', extraction.confidence);
console.log('Extracted Data:', extraction.extraction);
});
} else {
console.error('Error:', result.messages);
}
}
processDocument('invoice.pdf');C#
using System.Net.Http;
using System.Text.Json;
var apiKey = "your-api-key-here";
var baseUrl = "https://api.docdigitizer.com/v3/docingester/extract";
using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-API-Key", apiKey);
using var content = new MultipartFormDataContent();
content.Add(new StreamContent(File.OpenRead("invoice.pdf")), "files", "invoice.pdf");
var response = await client.PostAsync(baseUrl, content);
var json = await response.Content.ReadAsStringAsync();
var result = JsonSerializer.Deserialize<JsonElement>(json);
if (result.GetProperty("StateText").GetString() == "COMPLETED")
{
var extractions = result.GetProperty("output").GetProperty("extractions");
foreach (var extraction in extractions.EnumerateArray())
{
Console.WriteLine($"Document Type: {extraction.GetProperty("schemaName")}");
Console.WriteLine($"Confidence: {extraction.GetProperty("confidence")}");
}
}Next Steps
Updated 2 days ago
