Python SDK

Installation

pip install docdigitizer

Requirements: Python 3.10+. The only runtime dependency is requests.

Quick Start

from docdigitizer import DocDigitizer

client = DocDigitizer(api_key="your-api-key")
result = client.process_document("exam.pdf")

print(result.output.doc_type)   # "MedicalOrder"
print(result.output.country)    # "PT"

for field in result.output.fields:
    print(f"{field.name}: {field.value}")

What do I get back?

process_document() returns a ProcessingResult. Here's a real example:

result.output.doc_type     → "MedicalOrder"
result.output.confidence   → 0
result.output.country      → "PT"
result.output.fields       → [
    ExtractionField(name="patientName", value="JOÃO JOSÉ DA COSTA FERNANDES"),
    ExtractionField(name="patientId", value="1527620"),
    ExtractionField(name="examDescription", value="ECO OSTEOARTICULAR - JOELHO"),
    ExtractionField(name="requestingDoctor", value="Fernando Marques Moura"),
    ExtractionField(name="clinicalInformation", value="dores no joelho, ..."),
    ...
]
result.trace_id            → "4ESZ13H"
result.document_id         → None

`ProcessingResult`

Field	Type	Description
`output`	`ExtractionOutput`	Extracted data (see below)
`trace_id`	`str \| None`	Trace ID for debugging
`document_id`	`str \| None`	Document identifier
`context_id`	`str \| None`	Context identifier
`timers`	`dict`	Processing time breakdown
`headers`	`dict`	Raw response headers
`raw`	`dict`	Full raw JSON response

`ExtractionOutput`

Field	Type	Description
`doc_type`	`str \| None`	Detected document type (e.g. `"MedicalOrder"`, `"INV"`)
`confidence`	`float \| None`	Classification confidence (0.0 – 1.0)
`country`	`str \| None`	Detected country code (e.g. `"PT"`)
`fields`	`list[ExtractionField]`	Extracted key-value fields
`raw`	`dict`	Raw output data

`ExtractionField`

Field	Type	Description
`name`	`str`	Field name (e.g. `"patientName"`, `"total"`)
`value`	`Any`	Extracted value
`confidence`	`float \| None`	Extraction confidence (0.0 – 1.0), if available
`bounding_box`	`dict \| None`	Bounding box coordinates on the page, if available

Ways to send the file

# 1. Path string
result = client.process_document("path/to/document.pdf")

# 2. pathlib.Path
from pathlib import Path
result = client.process_document(Path("documents") / "exam.pdf")

# 3. File-like object (binary mode)
with open("document.pdf", "rb") as f:
    result = client.process_document(f)

Optional parameters

result = client.process_document(
    "document.pdf",
    document_id="550e8400-e29b-41d4-a716-446655440000",
    context_id="7c9e6679-7425-40de-944b-e07fc1f90ae7",
    doc_type="MedicalOrder",
    country="PT",
    extra_params={"customField": "value"},
)

Parameter	Type	Why use it?
`document_id`	`str`	Attach your own ID to the document for tracking
`context_id`	`str`	Group related documents together (e.g. same workflow)
`doc_type`	`str`	Hint the document type (e.g. `"INV"`, `"MedicalOrder"`) for better extraction
`country`	`str`	Hint the country (e.g. `"PT"`) for better extraction
`extra_params`	`dict`	Send additional parameters specific to your use case

Errors

The SDK raises typed exceptions. In practice, these are the 3 that matter:

from docdigitizer import DocDigitizer, AuthenticationError, APIError

try:
    result = client.process_document("document.pdf")
except AuthenticationError:
    # Invalid or expired API key — don't retry, fix credentials
    print("Invalid credentials")
except APIError as e:
    if e.status_code in (503, 504):
        # Service temporarily unavailable — worth retrying with backoff
        print(f"Service unavailable ({e.status_code}), retrying...")
    else:
        # Other API errors (400, 404, 500, ...) — typically not worth retrying
        print(f"API error: {e}")

Exception	When	Retry?
`AuthenticationError`	Invalid/expired API key (401)	No — fix credentials
`APIError` with 503/504	Service temporarily unavailable	Yes — with backoff
`APIError` with other codes	Bad request, not found, server error	Generally no

All exceptions extend DocDigitizerError. For more granular exceptions (e.g. BadRequestError, NotFoundError, ServerError), see docdigitizer.exceptions.

Advanced options

Sub-clients

DocDigitizer exposes sub-clients for functionality beyond document processing:

client = DocDigitizer(api_key="your-api-key")

# Registry — browse document types, countries, schemas
doc_types = client.registry.list_doc_types()
countries = client.registry.list_countries()
match = client.registry.find_best_schema(doc_type="INV", country="PT")

# Admin — CRUD on registry resources (requires elevated permissions)
client.admin.create_doc_type(code="INV", name="Invoice")

# Health checks
health = client.sync.health()

Authentication

# API Key (sent as X-API-Key header)
client = DocDigitizer(api_key="your-api-key")

# Bearer Token (sent as Authorization: Bearer header)
client = DocDigitizer(bearer_token="your-token")

# Custom auth strategy
from docdigitizer.auth import AuthStrategy
client = DocDigitizer(auth=my_custom_auth)

Custom URLs and timeouts

client = DocDigitizer(
    api_key="your-api-key",
    sync_base_url="https://custom.example.com/sync",
    registry_base_url="https://custom.example.com/registry",
    sync_timeout=180.0,       # default: 120s
    registry_timeout=60.0,    # default: 30s
)

Context manager

with DocDigitizer(api_key="your-api-key") as client:
    result = client.process_document("document.pdf")
# HTTP sessions closed automatically