Schemas

Please consider that:

Only one model is used as the baseline (defined in the Model schema, field IsBaseLine=true).
The content of the document obtained as the baseline is in the Comparison Output schema, Baseline field.
The content of the document obtained from the other non baseline model(s) is in the Model schema, RawExtraction field.
The comparison of each model document against the baseline is in the Model schema, Diff field.
Inside the Diff field we found an array of features by page. Where the features extracted by both models - baseline and non baseline - have extraction=true. Inside each feature we found fields. Each document field has the following data:
- baselineValue: with the content obtained from the baseline.
- extractionValue: with the content obtained from the current model
- isMatch (*) - True when both values are equal
- levenshteinDistance (*): with the Levenshtein distance between the baseline value and extraction value.
- similarityScore (*): with the similarity score between the baseline value and extractor value. Higher the numbers means higher similarity. Values between 0 (totally different) and 100 (equal). The formula is:
similarityScore=(1- levenshteinDistance / Max(Length(baselineValue),Length(levenshteinDistance)) ) * 100

(*) corresponding TAICI fields have identical meaning but the comparison is not considering spaces, nor sensitivity to case or accents

Comparison Output

      type: array
      items:
        type: object
        properties:
          Customer:
            type: string
            description: Customer name
          Version:
            type: string
            description: ComparisonOutput schema version
          DocumentId:
            type: string
            description: Document identifier requested for comparison
          CallbackUrl:
            type: string
            description: Uri of customer callback service
          CallbackMethod:
            type: string
            description: Http method used to call the customer callback service
          CallbackHeaders:
            type: object
            description: Dictionary of Http headers[key] and value, used to call customer callback service
          comparatorId:
            type: string
            description: Comparison identifier, in case of having more than one request for the same document
          state:
            type: string
            enum: [CREATED, PROCESSING, COMPLETED, ERROR]
          ComparatorCreatedAt:
            type: string
            format: date-time
          Models:
            type: array
            items:
              $ref: '#/Model'
          Baseline:
            type: object
            description: Schema of the document used as the baseline. Example: '#/AnnotatedDocument'
        required:
          - Customer
          - Version
          - DocumentId
          - CallbackUrl
          - CallbackMethod
          - CallbackHeaders
          - comparatorId
          - state
          - ComparatorCreatedAt
          - Models
          - Baseline

Model

      type: object
      properties:
        name:
          type: string
          description: Name of the model
        IsBaseLine:
          type: boolean
          description: True when the model is used to get the baseline document
        Diff:
          type: array
          items:
            type: array
            items:
              type: object
              properties:
                pageNumber:
                  type: number
                  description: Page number start by 0 (where the page 0 is actually the first page)
                features:
                  type: object
                  properties:
                    <feature_UUID>:
                      type: object
                      description: This is a dictionary, the key is the uuid of each feature
                      properties:
                        uuid:
                          type: string
                          description: Feature unique identifier
                        name:
                          type: string
                          description: Feature name
                        label:
                          type: string
                          description: Feature label
                        extraction:
                          type: boolean
                          description: True when feature was extracted
                        fields:
                          type: object
                          properties:
                            <field_UUID>:
                              $ref: '#/DiffField'
                              description: This is a dictionary, the key is the uuid of each field
                          required:
                            - <field_UUID>
                      required:
                        - uuid
                        - name
                        - label
                        - extraction
                        - fields
                  required:
                    - <feature_UUID>
              required:
                - pageNumber
                - features
        RawExtraction:
          type: object
          description: Schema of the document extracted by this model. Example: '#/AnnotatedDocument'
      required:
        - name
        - IsBaseLine

DiffField

      type: object
      properties:
        uuid:
          type: string
          description: Field unique identifier
        name:
          type: string
          description: Field name
        label:
          type: string
          description: Field label
        dataType:
          type: number
          description: Field data type code
        dataTypeText:
          type: string
          description: Field data type description
        baselineValue:
          type: object
          description: Field value from the baseline document
        extractionValue:
          type: object
          description: Field value from this extractor
        levenshteinDistance:
          type: number
          description: Levenshtein distance between the baseline value and extractor value
        similarityScore:
          type: number
          description: Similarity score between the baseline value and extractor value. Higher the numbers means higher similarity till equal (100).
        isMatch:
          type: boolean
          description: True when both values are equal
        levenhsteinDistanceTAICI:
          type: number
          description: Identical to Levenshtein distance but not considering spaces, nor sensitivity to case or accents
        similarityScoreTAICI:
          type: number
          description: Identical to Similarity score but based on levenhsteinDistanceTAICI
        isMatchTAICI:
          type: boolean
          description: True when both values are equal when not considering spaces, nor sensitivity to case or accents
      required:
        - uuid
        - name
        - label
        - dataType
        - dataTypeText
        - baselineValue
        - extractionValue
        - levenshteinDistance
        - similarityScore
        - isMatch
        - levenhsteinDistanceTAICI
        - similarityScoreTAICI
        - isMatchTAICI

AnnotatedDocument (example used by the Get Document )

      type: object
      properties:
        id:
          type: string
        createdAt:
          type: string
        changedAt:
          type: string
        rejected:
          type: boolean
        reviewed:
          type: boolean
        originalFileName:
          type: string
        originalDocumentClass:
          type: string
        annotations:
          type: array
          items:
            type: object
            properties:
              pageNumber:
                type: number
              features:
                type: object
                properties:
                  <feature_UUID>:
                    type: object
                    properties:
                      uuid:
                        type: string
                      name:
                        type: string
                      label:
                        type: string
                      createdAt:
                        type: string
                      origin:
                        type: number
                      originText:
                        type: string
                      confidence:
                        type: number
                      fields:
                        type: object
                        properties:
                          <field_UUID>:
                            type: object
                            properties:
                              uuid:
                                type: string
                              name:
                                type: string
                              label:
                                type: string
                              dataType:
                                type: number
                              dataTypeText:
                                type: string
                              createdAt:
                                type: string
                              confidence:
                                type: number
                              fieldValue:
                                type: string
                              origin:
                                type: number
                              originText:
                                type: string
                              annotationInformation:
                                type: object
                              subFields:
                                type: object
                            required:
                              - uuid
                              - name
                              - label
                              - dataType
                              - dataTypeText
                              - createdAt
                              - confidence
                              - fieldValue
                              - origin
                              - originText
                              - annotationInformation
                              - subFields
                        required:
                          - <field_UUID>
                    required:
                      - uuid
                      - name
                      - label
                      - createdAt
                      - origin
                      - originText
                      - confidence
                      - fields
                required:
                  - <feature_UUID>
            required:
              - pageNumber
              - features
        extractions:
          type: object
        state:
          type: number
        stateText:
          type: string
        totalDocPages:
          type: number
        customerName:
          type: string
        slaMoment:
          type: string
        isSync:
          type: boolean
        isHumanRevision:
          type: boolean
        finishedAt:
          type: string
        slaDelta:
          type: string
      required:
        - id
        - createdAt
        - changedAt
        - rejected
        - reviewed
        - originalFileName
        - originalDocumentClass
        - annotations
        - extractions
        - state
        - stateText
        - totalDocPages
        - customerName
        - slaMoment
        - isSync
        - isHumanRevision
        - finishedAt
        - slaDelta