Schema Management Quickstart

Schema Management Quickstart

Learn how to manage extraction schemas using the DocDigitizer Ontology API.

Overview

The Schema Management API allows you to:

  • Browse available document types and countries
  • Find the best schema for a document type/country combination
  • Create custom extraction schemas
  • Manage schema versions and lifecycle

Prerequisites

  • DocDigitizer account with API key
  • API key with Schema Management permissions

Understanding Schemas

What is a Schema?

A schema defines which fields to extract from a document type. For example, an Invoice schema might specify:

{
  "type": "object",
  "properties": {
    "invoiceNumber": { "type": "string" },
    "invoiceDate": { "type": "string", "format": "date" },
    "totalAmount": { "type": "number" },
    "vendorName": { "type": "string" },
    "lineItems": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unitPrice": { "type": "number" }
        }
      }
    }
  }
}

Schema Selection

When processing a document, the system selects the best schema based on:

  1. Customer schemas - Your private schemas (if any)
  2. Public schemas with country match - e.g., "Invoice + Portugal"
  3. Generic public schemas - e.g., "Invoice" without country

Quick Start

Get Reference Data

List all available document types and countries:

curl https://api.docdigitizer.com/registry/reference-data \
  -H "X-API-Key: your-api-key"

Response:

{
  "docTypes": [
    { "code": "Invoice", "name": "Commercial Invoice", "isActive": true },
    { "code": "Receipt", "name": "Point-of-Sale Receipt", "isActive": true },
    { "code": "Contract", "name": "Legal Contract", "isActive": true }
  ],
  "countries": [
    { "code": "PT", "name": "Portugal", "isActive": true },
    { "code": "US", "name": "United States", "isActive": true },
    { "code": "GB", "name": "United Kingdom", "isActive": true }
  ]
}

Find Best Schema

Get the appropriate schema for a document type:

curl -X POST https://api.docdigitizer.com/registry/schemas/find-best \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "docTypeCode": "Invoice",
    "countryCode": "PT"
  }'

Response:

{
  "schema": {
    "publicId": "sch_abc123xyz789",
    "publicVersionId": "schv_def456uvw012",
    "name": "Invoice Portugal",
    "version": 2,
    "status": "active",
    "docTypeCode": "Invoice",
    "countryCode": "PT",
    "content": {
      "type": "object",
      "properties": {
        "invoiceNumber": { "type": "string" },
        "nif": { "type": "string" },
        "totalAmount": { "type": "number" }
      }
    }
  },
  "matchType": "exact"
}

Match Types

Match TypeDescription
exactFound schema matching both docType and country
fallbackFound generic schema (docType only, no country)
nullNo matching schema found

Schema Lifecycle

Schemas go through a lifecycle:

DRAFT --> ACTIVE --> DEPRECATED
StatusDescription
draftIn development, can be modified freely
activePublished and in use
deprecatedOutdated, use newer version

Creating Custom Schemas

Step 1: Create a Draft Schema

curl -X POST https://api.docdigitizer.com/registry/admin/schemas \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Custom Invoice PT",
    "description": "Custom Portuguese invoice schema",
    "docTypeCode": "Invoice",
    "countryCode": "PT",
    "visibility": "private",
    "customerId": "your-customer-uuid",
    "content": {
      "type": "object",
      "properties": {
        "invoiceNumber": { "type": "string" },
        "nif": { "type": "string" },
        "customField": { "type": "string" }
      }
    }
  }'

Step 2: Test the Schema

Use the schema in document processing to verify it works correctly.

Step 3: Activate the Schema

curl -X POST https://api.docdigitizer.com/registry/admin/schemas/sch_abc123/activate \
  -H "X-API-Key: your-api-key"

Schema Visibility

VisibilityWho Can UseRequirements
publicEveryoneRequires docTypeCode AND countryCode
communityLogged-in usersFuture feature
privateOnly ownerRequires customerId

Versioning

When you update an active schema, a new version is created:

# Update an active schema - creates version 2
curl -X PATCH https://api.docdigitizer.com/registry/admin/schemas/sch_abc123 \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "content": {
      "type": "object",
      "properties": {
        "invoiceNumber": { "type": "string" },
        "nif": { "type": "string" },
        "newField": { "type": "string" }
      }
    }
  }'

Get All Versions

curl https://api.docdigitizer.com/registry/admin/schemas/sch_abc123/versions \
  -H "X-API-Key: your-api-key"

Code Examples

Python - Find and Use Schema

import requests

API_KEY = "your-api-key"
BASE_URL = "https://api.docdigitizer.com/registry"

# Find best schema
response = requests.post(
    f"{BASE_URL}/schemas/find-best",
    headers={
        "X-API-Key": API_KEY,
        "Content-Type": "application/json"
    },
    json={
        "docTypeCode": "Invoice",
        "countryCode": "PT"
    }
)

result = response.json()

if result["schema"]:
    schema = result["schema"]
    print(f"Found schema: {schema['name']} (v{schema['version']})")
    print(f"Match type: {result['matchType']}")
    print(f"Fields: {list(schema['content']['properties'].keys())}")
else:
    print("No matching schema found")

JavaScript - Create Custom Schema

const axios = require('axios');

const API_KEY = 'your-api-key';
const BASE_URL = 'https://api.docdigitizer.com/registry';

async function createSchema() {
  const response = await axios.post(
    `${BASE_URL}/admin/schemas`,
    {
      name: 'Custom Invoice Schema',
      docTypeCode: 'Invoice',
      countryCode: 'US',
      visibility: 'private',
      customerId: 'your-customer-uuid',
      content: {
        type: 'object',
        properties: {
          invoiceNumber: { type: 'string' },
          ein: { type: 'string' },  // US-specific field
          totalAmount: { type: 'number' }
        }
      }
    },
    {
      headers: {
        'X-API-Key': API_KEY,
        'Content-Type': 'application/json'
      }
    }
  );

  console.log('Created schema:', response.data.publicId);
  return response.data;
}

Best Practices

  1. Start with public schemas - Check if a suitable public schema exists before creating custom ones

  2. Use country-specific schemas - They include region-specific fields (NIF for Portugal, EIN for US, etc.)

  3. Test before activating - Thoroughly test draft schemas before activation

  4. Version carefully - Each update to an active schema creates a new version

  5. Deprecate gracefully - When replacing a schema, deprecate the old one rather than deleting

API Endpoints Reference

EndpointMethodDescription
/reference-dataGETGet all doc types and countries
/doc-typesGETList active document types
/countriesGETList active countries
/schemas/find-bestPOSTFind best matching schema
/admin/schemasGETList schemas (with filters)
/admin/schemasPOSTCreate new schema
/admin/schemas/{id}GETGet schema by ID
/admin/schemas/{id}PATCHUpdate schema
/admin/schemas/{id}DELETEDelete draft schema
/admin/schemas/{id}/activatePOSTActivate schema
/admin/schemas/{id}/deprecatePOSTDeprecate schema
/admin/schemas/{id}/versionsGETGet all versions

Next Steps