Schemas used by the API
Please consider that:
-
Only one model is used as the baseline (defined in the Model schema, field IsBaseLine=true).
-
The content of the document obtained as the baseline is in the Comparison Output schema, Baseline field.
-
The content of the document obtained from the other non baseline model(s) is in the Model schema, RawExtraction field.
-
The comparison of each model document against the baseline is in the Model schema, Diff field.
-
Inside the Diff field we found an array of features by page. Where the features extracted by both models - baseline and non baseline - have extraction=true. Inside each feature we found fields. Each document field has the following data:
- baselineValue: with the content obtained from the baseline.
- extractionValue: with the content obtained from the current model
- isMatch (*) - True when both values are equal
- levenshteinDistance (*): with the Levenshtein distance between the baseline value and extraction value.
- similarityScore (*): with the similarity score between the baseline value and extractor value. Higher the numbers means higher similarity. Values between 0 (totally different) and 100 (equal). The formula is:
similarityScore=(1- levenshteinDistance / Max(Length(baselineValue),Length(levenshteinDistance)) ) * 100
(*) corresponding TAICI fields have identical meaning but the comparison is not considering spaces, nor sensitivity to case or accents
Comparison Output
type: array
items:
type: object
properties:
Customer:
type: string
description: Customer name
Version:
type: string
description: ComparisonOutput schema version
DocumentId:
type: string
description: Document identifier requested for comparison
CallbackUrl:
type: string
description: Uri of customer callback service
CallbackMethod:
type: string
description: Http method used to call the customer callback service
CallbackHeaders:
type: object
description: Dictionary of Http headers[key] and value, used to call customer callback service
comparatorId:
type: string
description: Comparison identifier, in case of having more than one request for the same document
state:
type: string
enum: [CREATED, PROCESSING, COMPLETED, ERROR]
ComparatorCreatedAt:
type: string
format: date-time
Models:
type: array
items:
$ref: '#/Model'
Baseline:
type: object
description: Schema of the document used as the baseline. Example: '#/AnnotatedDocument'
required:
- Customer
- Version
- DocumentId
- CallbackUrl
- CallbackMethod
- CallbackHeaders
- comparatorId
- state
- ComparatorCreatedAt
- Models
- Baseline
Model
type: object
properties:
name:
type: string
description: Name of the model
IsBaseLine:
type: boolean
description: True when the model is used to get the baseline document
Diff:
type: array
items:
type: array
items:
type: object
properties:
pageNumber:
type: number
description: Page number start by 0 (where the page 0 is actually the first page)
features:
type: object
properties:
<feature_UUID>:
type: object
description: This is a dictionary, the key is the uuid of each feature
properties:
uuid:
type: string
description: Feature unique identifier
name:
type: string
description: Feature name
label:
type: string
description: Feature label
extraction:
type: boolean
description: True when feature was extracted
fields:
type: object
properties:
<field_UUID>:
$ref: '#/DiffField'
description: This is a dictionary, the key is the uuid of each field
required:
- <field_UUID>
required:
- uuid
- name
- label
- extraction
- fields
required:
- <feature_UUID>
required:
- pageNumber
- features
RawExtraction:
type: object
description: Schema of the document extracted by this model. Example: '#/AnnotatedDocument'
required:
- name
- IsBaseLine
DiffField
type: object
properties:
uuid:
type: string
description: Field unique identifier
name:
type: string
description: Field name
label:
type: string
description: Field label
dataType:
type: number
description: Field data type code
dataTypeText:
type: string
description: Field data type description
baselineValue:
type: object
description: Field value from the baseline document
extractionValue:
type: object
description: Field value from this extractor
levenshteinDistance:
type: number
description: Levenshtein distance between the baseline value and extractor value
similarityScore:
type: number
description: Similarity score between the baseline value and extractor value. Higher the numbers means higher similarity till equal (100).
isMatch:
type: boolean
description: True when both values are equal
levenhsteinDistanceTAICI:
type: number
description: Identical to Levenshtein distance but not considering spaces, nor sensitivity to case or accents
similarityScoreTAICI:
type: number
description: Identical to Similarity score but based on levenhsteinDistanceTAICI
isMatchTAICI:
type: boolean
description: True when both values are equal when not considering spaces, nor sensitivity to case or accents
required:
- uuid
- name
- label
- dataType
- dataTypeText
- baselineValue
- extractionValue
- levenshteinDistance
- similarityScore
- isMatch
- levenhsteinDistanceTAICI
- similarityScoreTAICI
- isMatchTAICI
AnnotatedDocument (example used by the Get Document )
type: object
properties:
id:
type: string
createdAt:
type: string
changedAt:
type: string
rejected:
type: boolean
reviewed:
type: boolean
originalFileName:
type: string
originalDocumentClass:
type: string
annotations:
type: array
items:
type: object
properties:
pageNumber:
type: number
features:
type: object
properties:
<feature_UUID>:
type: object
properties:
uuid:
type: string
name:
type: string
label:
type: string
createdAt:
type: string
origin:
type: number
originText:
type: string
confidence:
type: number
fields:
type: object
properties:
<field_UUID>:
type: object
properties:
uuid:
type: string
name:
type: string
label:
type: string
dataType:
type: number
dataTypeText:
type: string
createdAt:
type: string
confidence:
type: number
fieldValue:
type: string
origin:
type: number
originText:
type: string
annotationInformation:
type: object
subFields:
type: object
required:
- uuid
- name
- label
- dataType
- dataTypeText
- createdAt
- confidence
- fieldValue
- origin
- originText
- annotationInformation
- subFields
required:
- <field_UUID>
required:
- uuid
- name
- label
- createdAt
- origin
- originText
- confidence
- fields
required:
- <feature_UUID>
required:
- pageNumber
- features
extractions:
type: object
state:
type: number
stateText:
type: string
totalDocPages:
type: number
customerName:
type: string
slaMoment:
type: string
isSync:
type: boolean
isHumanRevision:
type: boolean
finishedAt:
type: string
slaDelta:
type: string
required:
- id
- createdAt
- changedAt
- rejected
- reviewed
- originalFileName
- originalDocumentClass
- annotations
- extractions
- state
- stateText
- totalDocPages
- customerName
- slaMoment
- isSync
- isHumanRevision
- finishedAt
- slaDelta