Python SDK
Installation
pip install docdigitizerRequirements: Python 3.10+. The only runtime dependency is
requests.
Quick Start
from docdigitizer import DocDigitizer
client = DocDigitizer(api_key="your-api-key")
result = client.process_document("exam.pdf")
print(result.output.doc_type) # "MedicalOrder"
print(result.output.country) # "PT"
for field in result.output.fields:
print(f"{field.name}: {field.value}")What do I get back?
process_document() returns a ProcessingResult. Here's a real example:
result.output.doc_type → "MedicalOrder"
result.output.confidence → 0
result.output.country → "PT"
result.output.fields → [
ExtractionField(name="patientName", value="JOÃO JOSÉ DA COSTA FERNANDES"),
ExtractionField(name="patientId", value="1527620"),
ExtractionField(name="examDescription", value="ECO OSTEOARTICULAR - JOELHO"),
ExtractionField(name="requestingDoctor", value="Fernando Marques Moura"),
ExtractionField(name="clinicalInformation", value="dores no joelho, ..."),
...
]
result.trace_id → "4ESZ13H"
result.document_id → None
ProcessingResult
ProcessingResult| Field | Type | Description |
|---|---|---|
output | ExtractionOutput | Extracted data (see below) |
trace_id | str | None | Trace ID for debugging |
document_id | str | None | Document identifier |
context_id | str | None | Context identifier |
timers | dict | Processing time breakdown |
headers | dict | Raw response headers |
raw | dict | Full raw JSON response |
ExtractionOutput
ExtractionOutput| Field | Type | Description |
|---|---|---|
doc_type | str | None | Detected document type (e.g. "MedicalOrder", "INV") |
confidence | float | None | Classification confidence (0.0 – 1.0) |
country | str | None | Detected country code (e.g. "PT") |
fields | list[ExtractionField] | Extracted key-value fields |
raw | dict | Raw output data |
ExtractionField
ExtractionField| Field | Type | Description |
|---|---|---|
name | str | Field name (e.g. "patientName", "total") |
value | Any | Extracted value |
confidence | float | None | Extraction confidence (0.0 – 1.0), if available |
bounding_box | dict | None | Bounding box coordinates on the page, if available |
Ways to send the file
# 1. Path string
result = client.process_document("path/to/document.pdf")
# 2. pathlib.Path
from pathlib import Path
result = client.process_document(Path("documents") / "exam.pdf")
# 3. File-like object (binary mode)
with open("document.pdf", "rb") as f:
result = client.process_document(f)Optional parameters
result = client.process_document(
"document.pdf",
document_id="550e8400-e29b-41d4-a716-446655440000",
context_id="7c9e6679-7425-40de-944b-e07fc1f90ae7",
doc_type="MedicalOrder",
country="PT",
extra_params={"customField": "value"},
)| Parameter | Type | Why use it? |
|---|---|---|
document_id | str | Attach your own ID to the document for tracking |
context_id | str | Group related documents together (e.g. same workflow) |
doc_type | str | Hint the document type (e.g. "INV", "MedicalOrder") for better extraction |
country | str | Hint the country (e.g. "PT") for better extraction |
extra_params | dict | Send additional parameters specific to your use case |
Errors
The SDK raises typed exceptions. In practice, these are the 3 that matter:
from docdigitizer import DocDigitizer, AuthenticationError, APIError
try:
result = client.process_document("document.pdf")
except AuthenticationError:
# Invalid or expired API key — don't retry, fix credentials
print("Invalid credentials")
except APIError as e:
if e.status_code in (503, 504):
# Service temporarily unavailable — worth retrying with backoff
print(f"Service unavailable ({e.status_code}), retrying...")
else:
# Other API errors (400, 404, 500, ...) — typically not worth retrying
print(f"API error: {e}")| Exception | When | Retry? |
|---|---|---|
AuthenticationError | Invalid/expired API key (401) | No — fix credentials |
APIError with 503/504 | Service temporarily unavailable | Yes — with backoff |
APIError with other codes | Bad request, not found, server error | Generally no |
All exceptions extend
DocDigitizerError. For more granular exceptions (e.g.BadRequestError,NotFoundError,ServerError), seedocdigitizer.exceptions.
Advanced options
Sub-clients
DocDigitizer exposes sub-clients for functionality beyond document processing:
client = DocDigitizer(api_key="your-api-key")
# Registry — browse document types, countries, schemas
doc_types = client.registry.list_doc_types()
countries = client.registry.list_countries()
match = client.registry.find_best_schema(doc_type="INV", country="PT")
# Admin — CRUD on registry resources (requires elevated permissions)
client.admin.create_doc_type(code="INV", name="Invoice")
# Health checks
health = client.sync.health()Authentication
# API Key (sent as X-API-Key header)
client = DocDigitizer(api_key="your-api-key")
# Bearer Token (sent as Authorization: Bearer header)
client = DocDigitizer(bearer_token="your-token")
# Custom auth strategy
from docdigitizer.auth import AuthStrategy
client = DocDigitizer(auth=my_custom_auth)Custom URLs and timeouts
client = DocDigitizer(
api_key="your-api-key",
sync_base_url="https://custom.example.com/sync",
registry_base_url="https://custom.example.com/registry",
sync_timeout=180.0, # default: 120s
registry_timeout=60.0, # default: 30s
)Context manager
with DocDigitizer(api_key="your-api-key") as client:
result = client.process_document("document.pdf")
# HTTP sessions closed automaticallyUpdated 2 days ago
