Configuration

The following section describes how Metadata Studio can be configured with custom data and schema for a particular use case. It is to be used by users who integrate Metadata Studio in their specific projects and environments.

Note

This documentation does not attempt to describe deployment specifics. See here for deployment instructions.

Introduction

As described in more detail in the application’s data model, the main objects in the Metadata Studio are:

  • Users
  • Projects
  • Corpora
  • Documents
  • Annotations
  • Concepts
  • SavedReports
  • Annotation Services

Metadata Studio configurations are kept in GraphDB. The configuration is split into two segments:

  • The model of the configuration data #classes-model - defines the classes with which Metadata Studio works. It is described as a SOML schema.
  • The concrete objects in a Metadata Studio installation - based on the defined schema model, objects can be created either through RDF or by applying runtime create objects mutations.

By default, no Annotation services are configured, so if you want to use third-party text mining API services, you would need to configure them as well.

The following is a configuration example of a schema request body customized for our Knowledge Net use case, which defines a specific simple document class and a Person inline annotation class that links to Wikidata people concepts. It comprises two segments:

  • the default SOML schema

    Important

    This part of the schema is identical for all Metadata Studio projects. It is not recommended to modify it in any way.

    id:           /soml/knowledge-net
    label:        MANT vocabulary
    created:      2021-08-13
    versionInfo:  1.0
    config: {lang: "ALL:en,NONE", implicit: "en", enable_mutations: true, disabledChecks: "rangeCheck", exposeSomlInGraphQL: true}
    
    prefixes:
      # common prefixes
      so: "http://www.ontotext.com/semantic-object/"
      dct: "http://purl.org/dc/terms/"
      gn: "http://www.geonames.org/ontology#"
      owl: "http://www.w3.org/2002/07/owl#"
      puml: "http://plantuml.com/ontology#"
      rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      rdfs: "http://www.w3.org/2000/01/rdf-schema#"
      skos: "http://www.w3.org/2004/02/skos/core#"
      void: "http://rdfs.org/ns/void#"
      wgs84: "http://www.w3.org/2003/01/geo/wgs84_pos#"
      xsd: "http://www.w3.org/2001/XMLSchema#"
    
      sys: "http://ontotext.com/soml/"
      omds: "http://www.ontotext.com/metadatastudio#"
      usr: "http://www.ontotext.com/metadatastudio/user/"
      wd: "http://www.wikidata.org/entity/"
      wdt: "http://www.wikidata.org/prop/direct/"
      inst: "http://www.ontotext.com/connectors/elasticsearch/instance#"
      elastic: "http://www.ontotext.com/connectors/elasticsearch#"
    
    specialPrefixes:
      vocab_prefix: voc
      base_iri:          http://knowledge.net/
      vocab_iri:         http://knowledge.net/
    
    properties:
      metadata: {range: Metadata, max: inf, rdfProp: omds:metadata}
      updateable: {range: boolean, min: 0, max: 1, rdfProp: omds:updateable}
      label: {rdfProp: rdfs:label, min: 1}
      status: {range: string, max: 1, rdfProp: omds:status, descr: "The object status, like ACTIVE or ARCHIVED etc."}
    
    objects:
      ### SOML Extension ###
    
      sys:SomlExtension:
        kind: abstract
        descr: "Base class for SOML Extension classes"
        props:
          sys:somlId: {descr: "ID of the SOML for which the Object is created"}
    
      sys:SpecialPrefixes:
        inherits: sys:SomlExtension
        descr: "Special prefixes (namespaces)"
        props:
          sys:baseIri: {descr: "Base IRI for data (resources), used in SOML characteristics such as type and prefix"}
          sys:vocabIri: {descr: "Default namespace for vocabulary (ontology) terms, i.e. object and prop names"}
    
      sys:Prefix:
        inherits: sys:SomlExtension
        descr: "Known prefix (namespace)"
        props:
          sys:id: {min: 1, descr: "Namespace prefix"}
          sys:iri: {min: 1, descr: "Namespace IRI"}
    
      sys:ObjectClass:
        inherits: sys:SomlExtension
        descr: "Base class for extending SOML with new Objects"
        pattern: "${sys_id}"
        props:
          sys:id: {min: 1, descr: "ID of the Object in the SOML"}
          sys:kind: {descr: "Abstract or not", valuesIn: ["abstract", "object"]}
          sys:inherits: {range: iri, descr: "Class to inherit from"}
          sys:label: {descr: "Label of the Object in the SOML"}
          sys:descr: {descr: "Description or clarification"}
          sys:type: {max: inf, descr: "Array of type value IRIs (prefixed, relative, or absolute)"}
          sys:typeProp: {descr: "Property that determines the business type"}
          sys:sparqlFederatedService: {descr: "The ID of a SPARQL Federation Service"}
          sys:meta: { range: sys:Meta, max: inf, descr: "Adds an additional meta directive in the GraphQL schema for the given property" }
          sys:props: {range: sys:Property, max: inf, descr: "Array of Properties of the Object"}
    
      sys:Property:
        inherits: sys:SomlExtension
        descr: "Class representing a Property for SOML Extension"
        props:
          sys:id: {min: 1, descr: "ID of the Property in the SOML"}
          sys:label: {descr: "Label of the Property in the SOML"}
          sys:descr: {descr: "Description or clarification"}
          sys:rdfProp: {descr: "RDF property name (if not allowed in GraphQL or hard to read) or SPARQL Template"}
          sys:range: {descr: "Datatype or SOML object type"}
          sys:min: {range: int, descr: "Minimum number of values, integer (mutations)"}
          sys:max: {descr: "Maximum number of values, integer. inf means unlimited (mutations)"}
          sys:restrictive: {range: boolean, descr: "Controls the SPARQL generation. Properties set as true would not generate OPTIONAL"}
          sys:meta: {range: sys:Meta, max: inf, descr: "Adds an additional meta directive in the GraphQL schema for the given property"}
    
      sys:Meta:
        descr: "Metadata for a given SOML Object/Property"
        props:
          sys:key: {min: 1, descr: "Key of the data"}
          sys:values: {min: 1, descr: "Value of the data"}
    
      ### Metadata Studio System ###
    
      Object:
        kind: abstract
        props:
          id:
            label: "ID"
            range: iri
            min: 1
            meta: {search: {visible: true, order: 0}, form: {visible: true, editable: false, order: 0}}
          type:
            range: iri
            max: inf
            meta: {search: {visible: false}, form: {visible: false}}
    
      NamedEntity:
        kind: abstract
        props:
          id:
          label:
          annotationContext: {inverseAlias: createdBy, range: Annotation, rangeCheck: false}
    
      User:
        type: omds:User
        inherits: NamedEntity
        pattern: "usr:${username}"
        props:
          # OMDS expects properties with names 'username' where the user's username would go
          # and also a 'label' where if configured the user display name will go
          label: {readOnly: true}
          username: {readOnly: true, rdfProp: omds:username}
          fullName: {readOnly: true, rdfProp: omds:fullName}
          email: {readOnly: true, rdfProp: omds:email}
          avatar: {rdfProp: omds:avatar}
          settings: {rdfProp: omds:settings}
    
      Timesensitive:
        kind: abstract
        props:
          metadata: {meta: {search: {visible: false}, form: {visible: false}}}
          createdAt:
            label: "Created at"
            range: dateTime
            min: 1
            max: 1
            rdfProp: omds:createdAt
            meta: {search: {visible: true, order: 5}, form: {visible: true, editable: false, order: 5}}
          createdBy:
            label: "Created by"
            range: NamedEntity
            min: 1
            max: 1
            rdfProp: omds:createdBy
            meta: {search: {visible: true, order: 6}, form: {visible: true, editable: false, order: 6}}
          modifiedAt:
            label: "Modified at"
            range: dateTime
            min: 0
            max: 1
            rdfProp: omds:modifiedAt
            meta: {search: {visible: true, order: 7}, form: {visible: true, editable: false, order: 7}}
          modifiedBy:
            label: "Modified by"
            range: NamedEntity
            min: 0
            max: 1
            rdfProp: omds:modifiedBy
            meta: {search: {visible: true, order: 8}, form: {visible: true, editable: false, order: 8}}
    
      SavedReport:
        inherits: Timesensitive
        type: omds:SavedReport
        props:
          corpusId:
          label:
          reportType:
            label: "The report type"
          data:
            label: "Holds report generated data as serialized JSON"
          config:
            label: "Holds configuration used during report generation"
    
      Project:
        inherits: Timesensitive
        type: omds:Project
        props:
          label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}}
          status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}}
          metadata: {meta: {search: {visible: false}, form: {visible: false}}}
          corpus: {range: Corpus, max: inf, rdfProp: omds:corpus, meta: {search: {visible: false}, form: {visible: false}}}
          logo: {range: iri, rdfProp: omds:logo, meta: {search: {visible: false}, form: {visible: false}}}
    
      Corpus:
        inherits: Timesensitive
        type: omds:Corpus
        props:
          label: {label: "Label", meta: {search: {visible: true, order: 1}, form: {visible: true}}}
          status: {label: "Status", meta: {search: {visible: true, order: 2}, form: {visible: true}}}
          metadata: {meta: {search: {visible: false}, form: {visible: false}}}
          allowedUsers: {max: inf, rdfProp: omds:allowedUsers, meta: {search: {visible: false}, form: {visible: false}}}
          document: {range: Document, max: inf, rdfProp: omds:document, meta: {search: {visible: false}, form: {visible: false}}}
          documentCount:
            label: "Documents count"
            meta: {search: {visible: true, order: 10}, form: {visible: false}}
            range: int
            max: 1
            rdfProp: |
              select  ?_subject (count(?documentId) as ?_value) where {
                ?_subject omds:document ?documentId.
              } group by ?_subject VALUES ?_subject {}
          project: {range: Project, inverseAlias: corpus, meta: {search: {visible: false}, form: {visible: false}}}
    
      Document:
        inherits: Timesensitive
        kind: abstract
        props:
          label: {label: "Label", min: 0, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true}}}
          metadata: {meta: {search: {visible: false}, form: {visible: false}}}
          text: {label: "Text", min: 1, max: 1, rdfProp: omds:text, meta: {search: {visible: false}, form: {visible: true, editable: true}}}
          annotations: {range: Annotation, max: inf, rdfProp: omds:annotations, meta: {search: {visible: false}, form: {visible: false}}}
          annotationsCount:
            label: "Annotations count"
            meta: {search: {visible: true, order: 10}, form: {visible: false}}
            range: int
            max: 1
            rdfProp: |
              select  ?_subject (count(?annotationId) as ?_value) where {
                ?_subject omds:annotations ?annotationId.
              } group by ?_subject VALUES ?_subject {}
          annotationsModifiedAt:
            label: "Annotations modified at"
            meta: {search: {visible: true, order: 11}, form: {visible: false}}
            range: dateTime
            max: 1
            rdfProp: |
              select  ?_subject (?lastModified as ?_value)
              where {
                ?_subject omds:annotations ?annotationId.
                ?annotationId omds:createdAt|omds:modifiedAt ?lastModified .
              } order by DESC (?lastModified) limit 1 VALUES ?_subject { }
          corpus: {range: Corpus, inverseAlias: document, meta: {search: {visible: false}, form: {visible: false}}}
    
      Metadata:
        type: omds:Metadata
        props:
          field: {min: 1, max: 1, rdfProp: omds:field}
          values: {max: inf, range: string, rdfProp: omds:values}
    
      AnnotationService:
        inherits: NamedEntity
        type: omds:AnnotationService
        props:
          label:
          serviceId: {min: 1}
          annotationQuery: {rdfProp: omds:annotationQuery, min: 1, max: 1}
          registrationQuery: {rdfProp: omds:registrationQuery, min: 1, max: 1}
          metadata: {meta: {search: {visible: false}, form: {visible: false}}}
          createdAt: {label: "Created at", range: dateTime, min: 1, max: 1, rdfProp: omds:createdAt, meta: {search: {visible: true, order: 5}, form: {visible: true, order: 5}}}
          createdBy: {label: "Created by", range: iri, min: 1, max: 1, rdfProp: omds:createdBy, meta: {search: {visible: true, order: 6}, form: {visible: true, order: 6}}}
          modifiedAt: {label: "Modified at", range: dateTime, min: 0, max: 1, rdfProp: omds:modifiedAt, meta: {search: {visible: true, order: 7}, form: {visible: true, order: 7}}}
          modifiedBy: {label: "Modified by", range: iri, min: 0, max: 1, rdfProp: omds:modifiedBy, meta: {search: {visible: true, order: 8}, form: {visible: true, order: 8}}}
    
      Concept:
        kind: abstract
        name: label
        props:
          label: {label: "Label", max: inf, range: stringOrLangString, meta: {search: {visible: true, order: 1}, form: {visible: true, order: 1}}}
          metadata: {meta: {preview: {fields: ["label"]}, search: {visible: false}, form: {visible: false, editable: false}}}
    
      Annotation:
        kind: abstract
        inherits: Timesensitive
        props:
          name: {label: "Name", meta: {search: {visible: true, editable: false, order: 1}, form: {visible: false, editable: false}}}
          type: {meta: {search: {visible: false}, form: {visible: false}}}
          document: {range: Document, inverseAlias: annotations, meta: {search: {visible: false}, form: {visible: false}}}
    
      InlineAnnotation:
        kind: abstract
        inherits: Annotation
        props:
          metadata: { meta: { preview: { fields: [ "key" ] } } }
          annotationStart: {label: "Annotation start", range: int, rdfProp: omds:annotationStart, meta: {search: {visible: true}, form: {visible: true, order: 2}}}
          annotationEnd: {label: "Annotation end", range: int, rdfProp: omds:annotationEnd, meta: {search: {visible: true}, form: {visible: true, order: 3}}}
          key:
            meta: {search: {visible: false}, form: {visible: false}}
            max: inf
            rdfProp: |
              ?_subject ^omds:annotations/omds:text ?text ;
                        omds:annotationStart ?start ;
                        omds:annotationEnd ?end .
              bind (SUBSTR(?text, ?start + 1, ?end - ?start) as ?_value)
          snippet:
            meta: {search: {visible: false}, form: {visible: false}}
            max: inf
            rdfProp: |
              ?_subject ^omds:annotations/omds:text ?text ;
                        omds:annotationStart ?start ;
                        omds:annotationEnd ?end .
              bind (<http://www.ontotext.com/js#getSnippet>(?text, ?start, ?end) as ?_value)
    
      DocumentAnnotation:
        kind: abstract
        inherits: Annotation
        props:
          metadata: { meta: {preview: {fields: ["id", "wikidata.label", "wikidata.id"]}, search: {visible: false}, form: {visible: false, editable: false}} }
    
              Form:
                kind: abstract
    
  • project-specific definitions: also a required part of the schema, but customized for the respective project

  SimpleDocument:
    inherits: Document
    type: omds:Document

  Person:
    inherits: Concept
    sparqlFederatedService: wikidata
    typeProp: generatedType
    type: wd:Q5
    props:
      generatedType: {rdfProp: "wdt:P31", meta: {search: {visible: false}, form: {visible: false}}}
      search:
        meta: {search: {visible: false}, form: {visible: false}}
        restrictive: true
        rdfProp: |
          ?_subject  rdfs:label ?label. filter (regex(str(?label), {{query}}, \"i\")).

  PersonAnnotation:
    inherits: InlineAnnotation
    type: http://knowledge.net/Annotation/Person
    props:
      wikidata: {range: Person, meta: {search: {visible: true}, form: {visible: true, editable: true}}}
      metadata: {meta: {preview: {fields: ["wikidata.label"]}, search: {visible: false}, form: {visible: false, editable: false}}}

### RBAC definitions ###
rbac:
  roles:
    Default:
      description: "Everyone can read everything"
      actions:
        - "*/*/*"

    Admin:
      description: "Administrator role, can read, write and delete objects"
      actions:
        - "*/*/*"

    Curator:
      actions:
        - "Project/*/read"
        - "Corpus/*/read"
        - "Document/*/read"
        - "Document/annotations/*"
        - "Concept/*/read"
        - "Concept/id/write"
        - "Annotation/*/*/(where: {createdBy: {_ifUser: {username: {IRE: ${ctx.claims.preferred_username}}}}})"

Classes Model

By default, Metadata Studio is started with a SOML schema describing the basic Metadata Studio classes based on a specific RDF model.

The schema is kept in the otp-system repository in GraphDB. Any data that comes in through runtime mutations is validated against this model.

The schema must describe any specific document classes, concept classes, and annotations based on the specific user needs.

The default schema can be overwritten in any of the following ways:

  • by changing the initial schema on deployment
  • at runtime through GraphQL mutations
  • at runtime through the Metadata Studio UI

The schema does not contain any specific Concepts or Annotations classes. The following classes can be configured through the UI:

  • Documents
  • Annotations
  • Concepts

Currently, Metadata Studio does not support the defining of custom Projects, Corpora, SavedReports, Users and AnnotationServices types that extend the original ones.

The following sections talk about defining custom Document, Annotation, and Concept classes.

RDF model

In the base RDF model of Metadata Studio, predicates that are part of the abstract classes are inherited in the more specific TimeSensitive and NamedEntity classes.

Objects and predicates

Metadata Studio uses the depicted objects and predicates for each object as follows:

  • omds:Metadata: Key-value object that contains various metadata.

    • omds:field: The name of the field.
    • omds:value: The value for the field.
  • omds:TimeSensitive

    • omds:createdAt: The time at which the resource was created.
    • omds:createdBy: Link to the user IRI that created the resource.
    • omds:modifiedAt: The time at which the resource was last modified. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the /graphql endpoint or through RDF, you need to update the modifiedAt value for this object yourself.
    • omds:modifiedBy: Link to the user’s IRI that was the last one to modified the resource. Note that the change of this value is handled by the Metadata Studio UI, so whenever you apply a mutation to an object through the /graphql endpoint or through RDF, you need to update the modifiedBy value for this object yourself.
  • omds:Project: The project type.

    • omds:status: The status of the project. Possible values are “ACTIVE” and “ARCHIVED”. Archived projects cannot be edited further.
    • omds:corpus: Link to a corpus that is part of the project.
    • rdfs:label: The label of the project that is displayed in the Metadata Studio UI Projects view.
  • omds:Corpus: The corpus type.

    • omds:status: The status of the corpus. Possible values are “ACTIVE” and “ARCHIVED”. Archived corpora cannot be edited further.
    • omds:document : Link to a document that is part of the corpus.
    • rdfs:label: The label of the corpus that is displayed in the Metadata Studio UI Projects view.
  • omds:Document: The abstract document type.

    • omds:text: The content of the document.
    • rdfs:label: The label of the document that is displayed in the Metadata Studio UI Corpus view.
  • omds:Annotation: The abstract base annotation type. It cannot be extended directly - instead, either the InlineAnnotation or the DocumentAnnotation class must be extended.

  • omds:DocumentAnnotation: The abstract document annotation type.

  • omds:InlineAnnotation: The abstract inline annotation type.

    • omds:annotationStart: The start positioning offset of the inline annotation.
    • omds:annotationEnd: The end positioning offset of the inline annotation.
  • omds:Concept: The abstract concept type.

  • omds:SavedReport: The saved report’s type.

    • base-iri:corpusId: The IRI of the corpus as a string value.
    • base-iri:data: The report results serialized in JSON.
    • base-iri:config: The report configurations serialized in JSON.
    • base-iri:reportType: The type of the report - either “FREQUENCY_COOCCURRENCE” or “F1” .
    • rdfs:label: The label of the report that will be visualized in the Metadata Studio UI Reports view.
  • omds:NamedEntity: The abstract class that is the range value for omds:createdBy and omds:modifiedBy values for all resources.

  • omds:User: The user type.

    • omds:username: The username of the user. By default, this value is used to build the user’s identifier as described in the users creation section, so it needs to satisfy the requirements described in the section.
    • rdfs:label: The label for the user that is presented in the Metadata Studio UI.
  • omds:AnnotationService: The annotation service type. Unlike all the other objects in Metadata Studio which are stored in the omds GraphDB repository, some of the information about annotation services is stored in the otp-system repository.

    • omds:annotationQuery (in otp-system repository): The query that is used during corpus annotation for the particular annotation service.
    • omds:registrationQuery (in otp-system repository): The query with which the particular annotation service was registered in the omds repository.
    • base-iri:serviceId (in otp-system repository): The ID of the annotation service used by the UI.
    • rdfs:label (in omds repository): The label for the annotation service to visualize in the UI.

Custom document classes

The default document class in Metadata Studio is defined as abstract. This means that the application relies on custom non-abstract document classes to be defined. In the default Metadata Studio schema, this is the SimpleDocument object.

A custom document class can be introduced, for example – Article, Heard, CV, MedicalPrescription, etc. The custom document class can have custom fields that the user can input. These fields can be visualized in the Corpora view or they can be used for filtering the documents in the Corpora view and in the Reports.

For example, if you want to define a Legal Contract document and specify the business activity purpose of the document as a category (such as Joint Venture agreement, NDA agreement, Employment contracts, etc), you can do this from the Manage schema view in the UI, or you can define a custom document class in the SOML schema as follows:

LegalContract:
    inherits: Document
    type: omds:LegalContract
    props:
        category: {label: "Category", range: string, min: 1, max: 1, rdfProp: omds:category, meta: {search: {visible: true}, form: {visible: true, editable: true}}}

Alternatively, you can apply the following mutation at runtime against the Metadata Studio backend /graphql endpoint:

mutation createCustomDocument {
  create_sys_ObjectClass(
    objects: {
      sys_id: "LegalContract"
      sys_inherits: "voc:Document"
      sys_type: "omds:LegalContract"
      sys_props: {
        sys_Property: [
          {
            sys_id: "category"
            sys_range: "string"
            sys_rdfProp: "omds:category"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true}"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

Once you have declared your custom document class, you can create actual documents in your corpus either through the Metadata Studio UI client or by inserting RDF data in GraphDB.

Custom annotation classes

Specific annotation classes that you would like to create in your corpus need to be configured by you. Each custom annotation class must extend one of the base annotation classes - either DocumentAnnotation (assigned as document tags) or InlineAnnotation (assigned to specific a substring of the document).

Besides the properties inherited from the base annotation class, each custom annotation can have custom properties. Each property must be defined with its property characteristics. A subset of the property characteristics supported by the Ontotext Platform Semantic Objects are also supported in Metadata Studio:

  • range: Specifies the class of the values of the property.
  • rdfProp: Specifies the RDF predicate with which the property is stored in GraphDB.
  • min: Currently, the highest supported value for all properties is 1. If the min cardinality for a field is 1, the UI enforces the user to enter a value for this field when creating annotations.
  • max: Currently, the highest supported value for all properties is 1.

In addition, a new property characteristic called meta is introduced. It controls how the UI visualizes and uses the property. The meta characteristic supports the following nested fields:

  • search: controls how the field is considered when the object is visualized in search views:

    • visible (type:boolean, default=false): Determines if the field is visible in search views, for example in the Document List view and in the Entity Linking view.
    • order (type: integer, default=-1): Determines the order in which the field is visualized, if visible. The fields are ordered in ascending order and all fields with order -1 are placed last.
  • form: Controls how the field is considered when an instance of the class is created from or visualized in the UI:

    • visible (type: boolean, default=false): Whether to visualize the field in creation/preview forms.
    • editable (type: boolean, default=true): If visible=true, whether the user is allowed to edit the field or not.
    • order (type: integer, default=-1): The order in which the fields are ordered. The fields are ordered in ascending order and all fields with order -1 are placed last.

These meta characteristics are configurable through the SOML schema and GraphQL mutations, but are not yet exposed for configuration from the Metadata Studio UI.

For more information on the properties, please see the developers documentation.

Custom annotation classes can inherit either the DocumentAnnotation or the InlineAnnotation class.

DocumentAnnotation classes

Document annotations are annotations created for the whole document as opposed to for a specific part of the text. This is the more general way to create annotations that will fit most use cases where it is not vital to know where exactly in the document something was mentioned.

Document annotations can have custom fields assigned to them with specific metadata. For example, if we want to define a custom document annotation type for legal contract agreement dates, besides doing that through the Manage schema view, we could have the following configuration in the SOML schema:

AgreementDate:
  inherits: DocumentAnnotation
  type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
  props:
    value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
    relevanceScore: { range: double, rdfProp: omds:relevanceScore, meta: { search: { visible: true, order: 2 }, form: { visible: true, editable: true, order: 2 } } }

where the value property will contain the actual date, for example “20 Nov 1991”.

Alternatively, the GraphQL mutation for registering this document annotation type would be:

mutation createCustomDocumentAnnotation {
  create_sys_ObjectClass(
    objects: {
      sys_id: "AgreementDate"
      sys_inherits: "voc:DocumentAnnotation"
      sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
      sys_props: {
        sys_Property: [
          {
            sys_id: "value"
            sys_range: "string"
            sys_rdfProp: "omds:value"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 1}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 1}"
                }
              ]
            }
          }
          {
            sys_id: "relevanceScore"
            sys_range: "double"
            sys_rdfProp: "omds:relevanceScore"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 2}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 2}"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

Or you can use the UI Manage schema view to extend the DocumentAnnotation class with your custom definition.

In addition to custom fields, Metadata Studio also supports a metadata field with a meta characteristic specifying a preview field that controls how the object is visualized in simple previews. The preview option has a fields argument – a list of fields to be visualized in object preview. This is useful for limiting the information shown in the document annotation previews by removing unreadable fields with little value to the user such as identifiers, modifiedBy fields, etc. This feature currently applies to annotations only and does not impact Corpora, Projects, and Documents.

If we want to define an AgreementDate document annotation, which contains the value of the date in the annotation preview, we would add the metadata field like this:

AgreementDate:
  inherits: DocumentAnnotation
  type: http://cuad.ontotext.com/DocumentAnnotation/AgreementDate
  props:
    value: { range: string, rdfProp: omds:value, meta: { search: { visible: true, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
    relevanceScore: { range: double, rdfProp: omds:relevanceScore, meta: { search: { visible: true, order: 2 }, form: { visible: true, editable: true, order: 2 } } }
    metadata: { meta: { preview: { fields: [ "value" ] }, search: { visible: false }, form: { visible: false, editable: false } } }

The above SOML definition is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:

mutation createCustomDocumentAnnotation {
  create_sys_ObjectClass(
    objects: {
      sys_id: "AgreementDate"
      sys_inherits: "voc:DocumentAnnotation"
      sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
      sys_props: {
        sys_Property: [
          {
            sys_id: "value"
            sys_range: "string"
            sys_rdfProp: "omds:value"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 1}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 1}"
                }
              ]
            }
          }
          {
            sys_id: "relevanceScore"
            sys_range: "double"
            sys_rdfProp: "omds:relevanceScore"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 2}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 2}"
                }
              ]
            }
          }
          {
            sys_id: "metadata"
            sys_meta: {
              sys_Meta: [
                { sys_key: "preview", sys_values: "{ fields: [value] }" }
                { sys_key: "search", sys_values: "{ visible: false }" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: false, editable: false }"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

All of the fields of the annotations can be used to sort the document annotations in the Document view. If you would like to have the annotations sorted by a specific field by default, you can provide a defaultSortField to the preview with the name of the field that you want to perform base sorting on. Upon opening the Document view, the document annotations are then sorted by the specified field value in ascending order. The last sorting selection in the document view is saved as the user’s sorting preference, so from then on the sorting will appear based on the last sorting selection.

For example:

mutation createCustomDocumentAnnotation {
  create_sys_ObjectClass(
    objects: {
      sys_id: "AgreementDate"
      sys_inherits: "voc:DocumentAnnotation"
      sys_type: "http://cuad.ontotext.com/DocumentAnnotation/AgreementDate"
      sys_props: {
        sys_Property: [
          {
            sys_id: "value"
            sys_range: "string"
            sys_rdfProp: "omds:value"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 1}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 1}"
                }
              ]
            }
          }
          {
            sys_id: "relevanceScore"
            sys_range: "double"
            sys_rdfProp: "omds:relevanceScore"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 2}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 2}"
                }
              ]
            }
          }
          {
            sys_id: "metadata"
            sys_range: "string"
            sys_rdfProp: "omds:value"
            sys_meta: {
              sys_Meta: [
                {
                  sys_key: "preview"
                  sys_values: "{ fields: [\"value\", \"relevanceScore\"], defaultSortField: \"relevanceScore\"}"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

InlineAnnotation classes

Inline annotations are annotations that are applicable only to a specific subset of a document. They are used when it is important to know where exactly in the document something was mentioned. This information is particularly important when preparing gold standard corpora for machine learning purposes, as it is useful for the algorithm to have this data.

For example, if you want to be able to create inline tags for people in your documents, you can add the following snippet to your SOML schema:

PersonAnnotation:
    inherits: InlineAnnotation
    type: http://knowledge.net/Annotation/Person
    props:
      name: {range: string, rdfProp: omds:name, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}

This is equivalent to applying the following GraphQL mutation to the Metadata Studio backend:

mutation createCustomInlineAnnotation {
  create_sys_ObjectClass(
    objects: {
      sys_id: "PersonAnnotation"
      sys_inherits: "voc:InlineAnnotation"
      sys_type: "http://knowledge.net/Annotation/Person"
      sys_props: {
        sys_Property: [
          {
            sys_id: "name"
            sys_range: "string"
            sys_rdfProp: "omds:name"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 1}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 1}"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

If you have People concepts present in your database and you want to link concepts from your reference dataset to the annotations (i.e., perform entity linking), you need to declare your custom concept class as part of the schema (see how to customize concept classes). Then, use the concept class you created as a range of the property through which you would like to establish the link.

For example, the following snippet defines a PersonAnnotation inline annotation class that has a personEntity property whose values are of instances of class Person:

PersonAnnotation:
    inherits: InlineAnnotation
    type: http://knowledge.net/Annotation/Person
    props:
      personEntity: {range: Person, rdfProp: omds:person, meta: {search: {visible: true, order: 1}, form: {visible: true, editable: true, order: 1}}}

Or you can use a GraphQL mutation instead:

mutation createCustomInlineAnnotation {
  create_sys_ObjectClass(objects: {
    sys_id: "PersonAnnotation"
    sys_inherits: "voc:InlineAnnotation"
    sys_type: "http://knowledge.net/Annotation/Person"
    sys_props: {sys_Property: [
      {sys_id: "personEntity", sys_range: "Person", sys_rdfProp: "omds:person",
        sys_meta: {
          sys_Meta:[
            {sys_key: "search", sys_values: "{ visible: true, order: 1}"},
            {sys_key: "form", sys_values: "{ visible: true, editable: true, order: 1}"}
          ]
        }
      }
    ]}
  }) {
    sys_ObjectClass {
      id
    }
  }
}

In addition, relation annotations can be modeled to link to more than one concept from the reference dataset. For example, if we want to model a CEO relation annotation between a person and an organization, we can add the following snippet to our SOML schema:

CEOAnnotation:
   inherits: InlineAnnotation
   type: http://knowledge.net/Annotation/CEO
   props:
     subject: { label: "Subject", range: PersonAnnotation, meta: { search: { visible: false, order: 1 }, form: { visible: true, editable: true, order: 1 } } }
     object: { label: "Object", range: OrganizationAnnotation, meta: { search: { visible: false, order: 2 }, form: { visible: true, editable: true, order: 2 } } }

Or we can apply the following mutation to the /graphql endpoint of the Metadata Studio backend:

mutation createCustomInlineAnnotation {
  create_sys_ObjectClass(
    objects: {
      sys_id: "CEOAnnotation"
      sys_inherits: "voc:InlineAnnotation"
      sys_type: "http://knowledge.net/Annotation/CEO"
      sys_props: {
        sys_Property: [
          {
            sys_id: "subject"
            sys_range: "PersonAnnotation"
            sys_rdfProp: "omds:subject"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 1}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 1}"
                }
              ]
            }
          }
          {
            sys_id: "object"
            sys_range: "OrganizationAnnotation"
            sys_rdfProp: "omds:object"
            sys_meta: {
              sys_Meta: [
                { sys_key: "search", sys_values: "{ visible: true, order: 2}" }
                {
                  sys_key: "form"
                  sys_values: "{ visible: true, editable: true, order: 2}"
                }
              ]
            }
          }
        ]
      }
    }
  ) {
    sys_ObjectClass {
      id
    }
  }
}

There are two ways to model a relation annotation:

  • It can be modeled to point to other annotations (as in the example above). In this case, when selecting a text in the Metadata Studio UI, you will be allowed to create such a relation only if the nested annotation types have already been created over a substring of the selected text.
  • It can be modeled to point to the objects from the reference dataset directly - in the above example, these are Person and Organization. When creating the relation annotation, you will be prompted to search for the concepts from the reference dataset for the nested entities.

Custom concept classes

When creating corpora for entity linking tasks, you might want to define custom concept classes. This will allow you to link instances of these classes as part of your annotations.

Your custom classes must inherit the default Concept class. Similarly to the annotations configurations, the meta property defines which concept fields will be visualized when doing entity linking. The search property defines a query that will be executed when the user creates PersonAnnotations and searches for People with names that match specific text. The {{query}} template is replaced at runtime with the specific text that you search for.

The following is an example for a declaration of a Person concept class:

Person:
    inherits: Concept
    type: mycustomprefix:Person
    props:
      description: {rdfProp: "mycustomprefix:description", meta: {search: {visible: true}, form: {visible: true}}}
      search:
        meta: {search: {visible: false}, form: {visible: false}}
        restrictive: true
        rdfProp: |
          ?_subject  skos:prefLabel|skos:altLabel ?label. filter (regex(str(?label), {{query}}, \"i\"))

In case you have GraphDB Elasticsearch/Lucene connectors for better full-text search, these can be used as well. For example:

search:
    meta: {search: {visible: false}, form: {visible: false}}
    restrictive: true
    rdfProp: |
      ?search a inst:people_omds ;
      elastic:query '''
      {
        "query": {
          "function_score": {
            "query": {
              "match_phrase_prefix": {
                "name": {{query}}
              }
            },
            "script_score": {
              "script": {
                "source": "if (!doc.containsKey('rdfRank') || doc.get('rdfRank').isEmpty()) { return  1; } double rdfRank = doc['rdfRank'].value; return 1 + Math.max(0, Math.log10(rdfRank * 100))"
              }
            }
          }

        }
      }''' ;
      elastic:entities ?_subject .

Your GraphDB repository (defined through the sparql_endpoint_repository property of the Metadata Studio API) must contain instances for the custom concept classes. If this data is located in a different SPARQL repository, you can use a federated service:

Person:
    inherits: Concept
    sparqlFederatedService: wikidata
    typeProp: generatedType
    type: wd:Q5
    props:
     ...

where the HTTP endpoint location of the Wikidata federated service is configured with the sparql.federated.services.wikidata property of the Metadata Studio backend.

If you would like to integrate an external service that contains visualizations of the concepts from your custom concept class, you can define the external service in your Metadata Studio deployment and link your custom concept class to to this service by pointing to the label of the service:

Person:
    inherits: Concept
    type: mycustomprefix:Person
    props:
      metadata: {search: {visible: false}, form: {visible: false}, externalService: "Wikidata"}
      ....

Custom form classes

When creating corpora for entity linking tasks, you might want to define custom form classes. This will allow users to follow a form based approach when the users do not know the types of annotations they need to create. In this case, the users populate a form which refers to a number of predefined annotation types.

Your custom classes must inherit the default abstract Form class. The properties in your custom Form class must either be with range Form or with range Annotation. The label of the Form will be shown in the form header.

The following is an example for a declaration of a Discharge form class:

DischargeForm:
        inherits: Form
        label: "Discharge Questionnaire"
        props:
          addmission:
                range: Admission
                meta:
                  form:
                        order: 1
          hospitalization:
                range: Hospitalization
                meta:
                  form:
                        order: 2

Admission:
        inherits: Form
        label: "Admission data"
        props:
          patientAge:
                label: "Patient's age"
                range: PatientAge
                meta:
                  form:
                        order: 1

In order to visualize Forms for any documents, the document classes used must have documentForm field of the meta property set. For example:

SimpleFormDocument:
        inherits: Document
        props:
          metadata:
                meta:
                  documentForm: DischargeForm

Creating Objects

This section describes how Metadata Studio can be filled with data by creating concrete objects.

Generally, there are two approaches - creating objects through the UI or through RDF. Currently, the UI supports creating instances for all classes except for Users and Annotation Services, which need to be set up through GraphQL mutations or RDF data.

Creating projects

The following is an example for the statements that can be inserted in the GraphDB repository (defined through the sparql_endpoint_repository property of the Metadata Studio API) in order for a project to appear in the UI:

<projectIRI> a omds:Project ;
         omds:createdAt "2022-02-10T14:59:00"^^xsd:dateTime ;
         omds:createdBy <userIRI> ;
         omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
         omds:modifiedBy <userIRI>  ;
         omds:status "ACTIVE" ;
         rdfs:label "Project Name" .

If you want to enable an external service in your project, you need to bind it either through the Metadata Studio UI or through SPARQL like so:

<projectIRI> a omds:Project ;
             omds:createdAt "2022-02-10T14:59:00"^^xsd:dateTime ;
             omds:createdBy <userIRI> ;
             omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
             omds:modifiedBy <userIRI>  ;
             omds:status "ACTIVE" ;
             omds:externalService <externalServiceIRI> ;
             rdfs:label "Project Name" .

Creating corpora

The following is an example for the statements that can be inserted in the GraphDB repository in order for a corpus to appear in the UI:

<projectIRI> omds:corpus <corpusIRI> .
<corpusIRI> a omds:Corpus ;
            omds:createdAt "2022-03-23T09:45:00"^^xsd:dateTime ;
            omds:createdBy <userIRI> ;
            omds:modifiedAt "2022-03-23T09:45:00"^^xsd:dateTime ;
            omds:modifiedBy <userIRI>  ;
            omds:status "ACTIVE" ;
            rdfs:label "Corpus Name" .

Creating documents

The following is an example for the statements that can be inserted in the GraphDB repository in order for a document to appear in the UI:

<corpusIRI> omds:document <documentIRI> .
<documentIRI> rdf:type omds:LegalContract ;
              rdfs:label "Berkshire Hills Bancorp Inc 2012-08-09" ;
              omds:text "Document content in plain text" ;
              omds:category "Endorsement Agreement" ;
              omds:createdBy <userIRI> ;
              omds:modifiedBy <userIRI> ;
              omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
              omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime .

Creating annotations

The following is an example for the statements that can be inserted in the GraphDB repository in order for an annotation to appear in the UI:

<documentIRI> omds:annotations <annotationIRI> .
<annotationIRI> rdf:type omds:Annotation ;
                rdf:type <customAnnotationTypeIRI> ;
                omds:createdAt "2022-05-20T08:00:00"^^xsd:dateTime ;
                omds:modifiedAt "2022-05-20T08:00:00"^^xsd:dateTime ;
                omds:createdBy <userIRI> ;
                omds:modifiedBy <userIRI> .

In case we are working with inline annotations, the following two statements must also be set:

<annotationIRI> omds:annotationStart "10"^^xsd:int ;
            omds:annotationEnd "20"^^xsd:int .

If the specific annotation type contains custom fields, they can be added to the triples above.

Creating forms

There is no need to create instances of type Form in GraphDB repository. Whenever new forms need to be added or the forms structure is changed, the changes should be reflected in the SOML configuration of the form types, their labels and properties.

Warning

During the design of the annotation schema, make sure all form properties are unique, as duplication will lead to unwanted behaviour in the Form view in the Document view, such as seemingly identical values to annotations pointing to identical range objects.

Validating Objects

This section describes how Metadata Studio can be configured to apply field validations upon creation of concrete objects.

Generally, validation happens on the backend so validation constraints need to be configured in the SOML. Currently, the SOML validation will be picked up by the UI when the SOML has the following configuration that is present in the default SOML:

config:
        exposeSomlInGraphQL: true

All possible validation characteristics are listed in the Property Characteristics section of the Semantic Objects.

Validating Annotation fields

Any object class fields can be configured with validation constraints. The following is an example how “age” field in a PersonalInformation annotation objects is configured to be an integer within the range of 0 to 120.

PersonalInformation:
        inherits: DocumentAnnotation
        props:
                name: { range: string, min: 1, meta: { form: { visible: true, editable: true, order: 0 } } }
                age: { range: integer, minInclusive: 0, maxInclusive: 120 , meta: { form: { visible: true, editable: true, order: 1 } } }

Users

Depending on the OAuth2 service that you would like to use, the users need to be configured in the specific service storage. The Metadata Studio deployment comes with Keycloak as a default user management solution.

Note

Usernames cannot contain whitespace characters. The reason is that the Metadata Studio tool relies on the convention that the concatenation between http://www.ontotext.com/metadatastudio/user/ and the username must result in a valid IRI.

Once you log in to Metadata Studio with a specific user, all objects that this user creates are assigned with createdBy and modifiedBy fields that point to the specific user identifier.

If you set up initial RDF data for projects, corpora, documents, or annotations, in order for the UI to be able to visualize the user references from the createdBy and modifiedBy predicates, a username must be defined in the database:

PREFIX omds: <http://www.ontotext.com/metadatastudio#>
<http://www.ontotext.com/metadatastudio/user/borislav> a omds:User;
      omds:username "Borislav" .

Note that the exact predicate for the username needs to be synced with the predicate defined for the user’s username field in the default SOML schema, which by default looks like this:

User:
    props:
      username: {readOnly: true, rdfProp: omds:username}
      ...

By default, Metadata Studio has the following user roles:

  • Default: Restricts access to everything if the logged-in user does not have any roles assigned.
  • Curator: Grants read access to all resources as well as right to create annotations for existing documents.
  • Admin: Grants all actions on all objects and their properties to a user with that role.
  • SchemaRBACAdmin: Allows the user to modify the SOML schema.

New roles can be added and modified by a user with a SchemaRBACAdmin role. For more information on the syntax of the RBAC schema, see the official Ontotext Platform Semantic Objects documentation.

Creating Annotation Services

If a third-party text analysis annotation service needs to be integrated in Metadata Studio, a query to register the text analysis service and to handle the annotation needs to be configured. Metadata Studio relies on the GraphDB Text Mining plugin to integrate with arbitrary third-party text analysis services. Unlike the other configurable components, the text mining annotation query cannot be configured through the Metadata Studio UI yet. It needs to be set up by applying a GraphQL mutation to the Metadata Studio API. The mutation registers the annotation service with:

  • a specific text mining plugin registration query
  • a specific annotation query

The AnnotationService object has a label and a serviceId. The label controls the label with which the annotationService will be visualized in the UI under the Annotation services drop-down.

Registration query

The registration query is a query that instantiates the GraphDB Text Mining plugin. It must specify the URL to the text mining service, the headers that must be sent during annotation requests, and any specific transformations that should be applied over the annotation response.

In addition, it creates a label for the annotation service that the UI uses to visualize the service creator source.

For more information on how to register text mining plugins, check the GraphDB documentation.

mutation createLegalTaggerServiceExample {
  create_AnnotationService(
    objects: {
      label: "Legal Tagger",
      serviceId: "http://cuad.ontotext.com/legalTagger",
      createdBy: "some-user-identifier",
      createdAt: "some-timestamp",
      registrationQuery: """
        PREFIX : <http://www.ontotext.com/textmining#>
        PREFIX inst: <http://www.ontotext.com/textmining/instance#>
        PREFIX cuad: <http://cuad.ontotext.com/>

        INSERT DATA {
            inst:legalTagger :connect :Ces;
                            :service "some-url-here" ;
                            :header "Accept: application/vnd.ontotext.ces+json;charset=utf-8";
                            :header "Content-type: text/plain".

            cuad:legalTagger a <http://www.ontotext.com/metadatastudio#AnnotationService> ;
                            rdfs:label "Legal Tagger" .
        }
      """,
      annotationQuery: ....
      ....
    }
  ) {
    annotationService {
      id
    }
  }
}

It is recommended that the IRI identifier that we declare as an annotation service and the specific rdfs:label match the serviceId and label values from the mutation.

Annotation query

Upon selection of a specific annotation service for a particular corpus, the Metadata Studio backend splits the documents from the corpus into batches of ten documents. It then sends the documents from each batch to the text mining API service to generate annotations for these documents.

The annotation query defines how the documents should be sent for annotation and how the response should be stored in GraphDB. It is entirely configurable by the user, which makes this process compatible with any third-party services accessible through HTTP, which produce annotations with text position offsets.

The annotation query also takes care of cleaning up previously existing annotations from the same text mining API service, which allows you to execute multiple annotation processes over the same corpus over time.

Tip

It is a good practice to keep the annotations from each annotation service in a specific context, as this makes the maintenance of the data easier. The format in which the annotations are stored should correspond to the format defined in the Custom annotations section. The createdBy and modifiedBy fields should point to the IRI of the specific annotation service.

Following is an example for an annotation query that creates inline annotations returned from the Legal tagger:

mutation createLegalTaggerServiceExample {
  create_AnnotationService(
    objects: {
      label: "Legal Tagger",
      serviceId: "http://cuad.ontotext.com/legalTagger",
      createdBy: "some-user-identifier",
      createdAt: "some-timestamp",
      registrationQuery: .....,
      annotationQuery: """
          PREFIX inst: <http://www.ontotext.com/textmining/instance#>
          PREFIX : <http://www.ontotext.com/textmining#>
          PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
          PREFIX omds: <http://www.ontotext.com/metadatastudio#>
          PREFIX cuad: <http://cuad.ontotext.com/>
          PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

          DELETE {
            GRAPH <http://cuad.ontotext.com/legalTagger> {
              ?document omds:annotations ?oldAnnotation .
              ?oldAnnotation ?oldPredicate ?oldObject
            }
          }
          INSERT {
              GRAPH <http://cuad.ontotext.com/legalTagger> {
                ?annotationId omds:annotationStart ?annotationStart ;
                      omds:annotationEnd ?annotationEnd ;
                      omds:document ?document ;
                      a omds:Annotation ;
                      a ?annotationType ;
                      omds:createdBy cuad:legalTagger ;
                      omds:createdAt ?time ;
                      omds:modifiedBy cuad:legalTagger ;
                      omds:modifiedAt ?time  ;
                      ?answerPredicate ?answer .
                 ?document omds:annotations ?annotationId .
              }
          }
          WHERE {
            {
                ?service a inst:tagService;
                       :text ?text ;
                       :serviceErrors -1 .
                {
                    SELECT ?text ?document ?time WHERE {
                      VALUES ?document {
                        {{documents}}
                      }
                      ?document omds:text ?text .
                      BIND(NOW() as ?time)
                    }
                }
                graph inst:legalTagger {
                  ?annotatedDocument :annotations ?annotation .
                  ?annotation :annotationText ?answer ;
                        :annotationType ?type ;
                        :annotationStart ?annotationStartLong ;
                        :annotationEnd ?annotationEndLong .
                  ?annotation :features/:class ?class .
                }
                BIND(xsd:int(?annotationStartLong) as ?annotationStart)
                BIND(xsd:int(?annotationEndLong) as ?annotationEnd)

                BIND(IRI(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", ?type)) as ?annotationType)
                BIND(IRI(CONCAT(CONCAT("http://cuad.ontotext.com/InlineAnnotation/", "Tagger/", STRUUID()), STRAFTER(STR(?annotation),"-something-that-is-not-present-"))) as ?annotationId)
            }
            UNION # Use union to select also the annotations from previous annotation processes in order to delete them
            {
              GRAPH <http://cuad.ontotext.com/legalTagger> {
                ?document omds:annotations ?oldAnnotation .
                VALUES ?document {
                  {{documents}}
                }
                ?oldAnnotation omds:createdAt ?createdAt .
                ?oldAnnotation ?oldPredicate ?oldObject .
                filter (?createdAt != ?time)
              }
            }
        }
        """,
      ....
    }
  ) {
    annotationService {
      id
    }
  }
}

When the annotation processes is triggered from the UI for a particular corpus, the Metadata Studio backend retrieves all documents from this corpus. It splits the documents to batches of ten and processes all batches sequentially by replacing the {{documents}} placeholder with the documents ids from each batch.

Creating External Services

External services improve the annotation workflow in Metadata Studio by providing quick access to external tools that visualize the concepts from the reference dataset that you are working with. For example, if you create annotations against Wikidata, you can integrate Metadata Studio with the Wikidata Web interface. As a result, whenever you click on annotations for a concept, you will be redirected to the Wikidata page containing the information about this concept.

To define external services, you need to insert an RDF definition for these services in GraphDB. This includes the label that this service will be referenced by in your SOML schema as well as how to compute the URL to the external service based on the Concept’s IRI. For the latter, you can make use of the <concept-iri> variable. Thus, the external service must provide a GET REST endpoint that accepts the concept’s IRI as a path or as a request parameter.

Note

To use the External services, you must have referenced them both in the Project configuration and the Concept class definition.

Wikidata external service

To resolve the URLs for Wikidata, you can use the Wikidata concept ID directly, as this points to the corresponding concept Wikidata page.

@prefix omds: <http://www.ontotext.com/metadatastudio#> .
@prefix omds-ext: <http://www.ontotext.com/metadatastudio#extService/> .

omds-ext:WikidataService a omds:ExternalService ;
  rdfs:label "Wikidata" ;
  omds:url "<concept-iri>".

NOW external service

For integration with services such as now.ontotext.com, in which the concept information page is built by applying the concept IRI as a suffix to a URL, you can use the following RDF:

@prefix omds: <http://www.ontotext.com/metadatastudio#> .
@prefix omds-ext: <http://www.ontotext.com/metadatastudio#extService/> .

omds-ext:NOWService a omds:ExternalService ;
  rdfs:label "NOW" ;
  omds:url "https://now.ontotext.com/#/concept&uri=<concept-iri>".