Schemas used by the API

Please consider that:

  • Only one model is used as the baseline (defined in the Model schema, field IsBaseLine=true).

  • The content of the document obtained as the baseline is in the Comparison Output schema, Baseline field.

  • The content of the document obtained from the other non baseline model(s) is in the Model schema, RawExtraction field.

  • The comparison of each model document against the baseline is in the Model schema, Diff field.

  • Inside the Diff field we found an array of features by page. Where the features extracted by both models - baseline and non baseline - have extraction=true. Inside each feature we found fields. Each document field has the following data:

    • baselineValue: with the content obtained from the baseline.
    • extractionValue: with the content obtained from the current model
    • isMatch (*) - True when both values are equal
    • levenshteinDistance (*): with the Levenshtein distance between the baseline value and extraction value.
    • similarityScore (*): with the similarity score between the baseline value and extractor value. Higher the numbers means higher similarity. Values between 0 (totally different) and 100 (equal). The formula is:

    similarityScore=(1- levenshteinDistance / Max(Length(baselineValue),Length(levenshteinDistance)) ) * 100

    (*) corresponding TAICI fields have identical meaning but the comparison is not considering spaces, nor sensitivity to case or accents


Comparison Output

      type: array
      items:
        type: object
        properties:
          Customer:
            type: string
            description: Customer name
          Version:
            type: string
            description: ComparisonOutput schema version
          DocumentId:
            type: string
            description: Document identifier requested for comparison
          CallbackUrl:
            type: string
            description: Uri of customer callback service
          CallbackMethod:
            type: string
            description: Http method used to call the customer callback service
          CallbackHeaders:
            type: object
            description: Dictionary of Http headers[key] and value, used to call customer callback service
          comparatorId:
            type: string
            description: Comparison identifier, in case of having more than one request for the same document
          state:
            type: string
            enum: [CREATED, PROCESSING, COMPLETED, ERROR]
          ComparatorCreatedAt:
            type: string
            format: date-time
          Models:
            type: array
            items:
              $ref: '#/Model'
          Baseline:
            type: object
            description: Schema of the document used as the baseline. Example: '#/AnnotatedDocument'
        required:
          - Customer
          - Version
          - DocumentId
          - CallbackUrl
          - CallbackMethod
          - CallbackHeaders
          - comparatorId
          - state
          - ComparatorCreatedAt
          - Models
          - Baseline

Model

      type: object
      properties:
        name:
          type: string
          description: Name of the model
        IsBaseLine:
          type: boolean
          description: True when the model is used to get the baseline document
        Diff:
          type: array
          items:
            type: array
            items:
              type: object
              properties:
                pageNumber:
                  type: number
                  description: Page number start by 0 (where the page 0 is actually the first page)
                features:
                  type: object
                  properties:
                    <feature_UUID>:
                      type: object
                      description: This is a dictionary, the key is the uuid of each feature
                      properties:
                        uuid:
                          type: string
                          description: Feature unique identifier
                        name:
                          type: string
                          description: Feature name
                        label:
                          type: string
                          description: Feature label
                        extraction:
                          type: boolean
                          description: True when feature was extracted
                        fields:
                          type: object
                          properties:
                            <field_UUID>:
                              $ref: '#/DiffField'
                              description: This is a dictionary, the key is the uuid of each field
                          required:
                            - <field_UUID>
                      required:
                        - uuid
                        - name
                        - label
                        - extraction
                        - fields
                  required:
                    - <feature_UUID>
              required:
                - pageNumber
                - features
        RawExtraction:
          type: object
          description: Schema of the document extracted by this model. Example: '#/AnnotatedDocument'
      required:
        - name
        - IsBaseLine

DiffField

      type: object
      properties:
        uuid:
          type: string
          description: Field unique identifier
        name:
          type: string
          description: Field name
        label:
          type: string
          description: Field label
        dataType:
          type: number
          description: Field data type code
        dataTypeText:
          type: string
          description: Field data type description
        baselineValue:
          type: object
          description: Field value from the baseline document
        extractionValue:
          type: object
          description: Field value from this extractor
        levenshteinDistance:
          type: number
          description: Levenshtein distance between the baseline value and extractor value
        similarityScore:
          type: number
          description: Similarity score between the baseline value and extractor value. Higher the numbers means higher similarity till equal (100).
        isMatch:
          type: boolean
          description: True when both values are equal
        levenhsteinDistanceTAICI:
          type: number
          description: Identical to Levenshtein distance but not considering spaces, nor sensitivity to case or accents
        similarityScoreTAICI:
          type: number
          description: Identical to Similarity score but based on levenhsteinDistanceTAICI
        isMatchTAICI:
          type: boolean
          description: True when both values are equal when not considering spaces, nor sensitivity to case or accents
      required:
        - uuid
        - name
        - label
        - dataType
        - dataTypeText
        - baselineValue
        - extractionValue
        - levenshteinDistance
        - similarityScore
        - isMatch
        - levenhsteinDistanceTAICI
        - similarityScoreTAICI
        - isMatchTAICI

AnnotatedDocument (example used by the Get Document )

      type: object
      properties:
        id:
          type: string
        createdAt:
          type: string
        changedAt:
          type: string
        rejected:
          type: boolean
        reviewed:
          type: boolean
        originalFileName:
          type: string
        originalDocumentClass:
          type: string
        annotations:
          type: array
          items:
            type: object
            properties:
              pageNumber:
                type: number
              features:
                type: object
                properties:
                  <feature_UUID>:
                    type: object
                    properties:
                      uuid:
                        type: string
                      name:
                        type: string
                      label:
                        type: string
                      createdAt:
                        type: string
                      origin:
                        type: number
                      originText:
                        type: string
                      confidence:
                        type: number
                      fields:
                        type: object
                        properties:
                          <field_UUID>:
                            type: object
                            properties:
                              uuid:
                                type: string
                              name:
                                type: string
                              label:
                                type: string
                              dataType:
                                type: number
                              dataTypeText:
                                type: string
                              createdAt:
                                type: string
                              confidence:
                                type: number
                              fieldValue:
                                type: string
                              origin:
                                type: number
                              originText:
                                type: string
                              annotationInformation:
                                type: object
                              subFields:
                                type: object
                            required:
                              - uuid
                              - name
                              - label
                              - dataType
                              - dataTypeText
                              - createdAt
                              - confidence
                              - fieldValue
                              - origin
                              - originText
                              - annotationInformation
                              - subFields
                        required:
                          - <field_UUID>
                    required:
                      - uuid
                      - name
                      - label
                      - createdAt
                      - origin
                      - originText
                      - confidence
                      - fields
                required:
                  - <feature_UUID>
            required:
              - pageNumber
              - features
        extractions:
          type: object
        state:
          type: number
        stateText:
          type: string
        totalDocPages:
          type: number
        customerName:
          type: string
        slaMoment:
          type: string
        isSync:
          type: boolean
        isHumanRevision:
          type: boolean
        finishedAt:
          type: string
        slaDelta:
          type: string
      required:
        - id
        - createdAt
        - changedAt
        - rejected
        - reviewed
        - originalFileName
        - originalDocumentClass
        - annotations
        - extractions
        - state
        - stateText
        - totalDocPages
        - customerName
        - slaMoment
        - isSync
        - isHumanRevision
        - finishedAt
        - slaDelta