NIF 2.0 Transformation Engine

Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the NIF 2.0 (NLP Interchange Format) standard.

Processed Information (Input)

Apache Stanbol manages NLP results by the Analysed Text content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the NIF 2.0 core ontology. Transformed information will be added as RDF to the Enhancement Metadata and be included in the RDF response of the enhancement request.

If a ContentItem does not contain this content part it will not be processed by this engine.

Created RDF

The engine serializes NLP annotations as defined by the NIF 2.0 core ontology. More specifically the engine is capable of it the following information:

Segment URIs do use RFC 5147. It can be configured if the nif:RFC5147String type is only added to the nif:Context instance or to all serialized nif:Stringinstances.
Selector information like nif:beginIndex, nif:endIndex as well as nif:before, nif:anchorOf and nif:after. For spans longer as 100 chars the nif:head property is used instead of nif:anchorOf. Their is an option to prevent those features to be serialized. This will greatly decrease the triple count however clients will need to parse the start/end positions from the segment URI.
All serialized nif:String instances do refer the nif:Context with the nif:referenceContext. The context will refer to the URI of the ContentItem by using the nif:sourceUrl property. The inclusion of the content as String literal is NOT supported by this engine.
String hierarchies: This includes nif:subWord nif:superWord and nif:sentence properties. If not required serializing of those can be deactivated.
String navigation: This includes nif:nextSentence, nif:previousSentnece, nif:nextWord and nif:previousWord properties. The transitive versions of those properties are NOT supported. Users that want to have transitive reasoning will anyway get those from the reasoner. String navigation properties can be deactivated. This will greatly decrease the triple count.
String annotations: This currently includes nif:oliaCategory, nif:oliaConfidence and nif:posTag. nif:oliaLink is not supported as the Stanbol NLP API does not provide the required information. Also support for word level sentiment annotations is not yet implemented.

Configuration

The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:

NIF2.0 Engine Configuration

Selector (enhancer.engines.nlp2rdf.selector): Allows to enable/disable the serialization of selector related properties such as nif:beginIndex, nif:endIndex, nif:before, nif:anchorOf and nif:after. If disabled clients can still parse the start/end indexes from the RFC 5147 encoded segment URI.
Hierarchy (enhancer.engines.nlp2rdf.hierarchy): Switch that allows to enable/disable writing of hierarchical links. This includes olia:sentence, olia:superString and olia:subString properties.
Previous and Next Links (enhancer.engines.nlp2rdf.previousNext): Allows to enable/disable the serialization of links to the previous/next sentence/word
Context only URI Scheme (enhancer.engines.nlp2rdf.cotextOnlyUriScheme): If enabled the used RFC 5147 URI scheme is added only to the rdf:type of the nif:Context. If disabled the nif:RFC5147String rdf:type is added to all segments.
String Type (enhancer.engines.nlp2rdf.writeStringType): If enabled the nif:String type is added to all serialized segments. If disabled only more specific types like nif:Sentence or nif:Word are used.

Examples

This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence The Apache Stanbol Enhancer can detect entities in text was used for generating this example.

@prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
@prefix nif  <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix olia  <http://purl.org/olia/olia.owl#> .
@prefix  xsd  <http://www.w3.org/2001/XMLSchema#> .

The first Turtle snippet shows the nif:Context instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the nif:sourceUrl.

content:char=0
    a nif:Context ,  nif:RFC5147String ;
    nif:anchorOf
        "The Apache Stanbol Enhancer can detect entities in text."@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "56"^^xsd:int ;
    nif:sourceUrl
        <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .

Next the segment describing the only sentence in the example text. NOTE: if nif:before or nif:after are empty strings it indicates that the section start/ends at the beginning/end of the parsed content.

content:char=0,56
    a nif:RFC5147String ,  nif:Sentence ;
    nif:before
        ""@en ;
    nif:anchorOf
        "The Apache Stanbol Enhancer can detect entities in text."@en ;
    nif:after
        ""@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "56"^^xsd:int ;
    nif:firstWord
        content:char=0,3 ;
    nif:referenceContext
        content:char=0 .

The following snippet shows the segments for the first three words of the Sentence.

content:char=0,3
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        ""@en ;
    nif:anchorOf
        "The"@en ;
    nif:after
        " Apache St"@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "3"^^xsd:int ;
    nif:nextWord
        content:char=4,10 ;
    nif:oliaCategory
         olia:Determiner ,  olia:PronounOrDeterminer ;
    nif:oliaConf
        "0.9662179110607207"^^xsd:double ;
    nif:posTag
        "DT"^^xsd:string ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=0,10 .

content:char=4,10
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "The "@en ;nif:anchorOf
    nif:anchorOf
        "Apache"@en ;
    nif:after
        " Stanbol E"@en ;
    nif:beginIndex
        "4"^^xsd:int ;
    nif:endIndex
        "10"^^xsd:int ;
    nif:nextWord
        content:char=11,18 ;
    nif:oliaCategory
         olia:Noun ,  olia:PluralQuantifier ,  olia:ProperNoun ,  olia:Quantifier ;
    nif:oliaConf
        "0.7882547205652428"^^xsd:double ;
    nif:posTag
        "NNPS"^^xsd:string ;
    nif:previousWord
        content:char=0,3 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=0,10 .

content:char=11,18
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "he Apache "@en ;
    nif:anchorOf
        "Stanbol"@en ;
    nif:after
        " Enhancer "@en ;
    nif:beginIndex
        "11"^^xsd:int ;
    nif:endIndex
        "18"^^xsd:int ;
    nif:nextWord
        content:char=19,27 ;
    nif:oliaCategory
         olia:Noun ,  olia:ProperNoun ,  olia:Quantifier ,  olia:SingularQuantifier ;
    nif:oliaConf
        "0.701014272348203"^^xsd:double ;
    nif:posTag
        "NNP"^^xsd:string ;
    nif:previousWord
        content:char=4,10 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=11,27 .

Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using nif:subString.

content:char=28,38
    a nif:Phrase ,  nif:RFC5147String ;
    nif:before
        " Enhancer "@en ;
    nif:anchorOf
        "can detect"@en ;
    nif:after
        " entities "@en ;
    nif:beginIndex
        "28"^^xsd:int ;
    nif:endIndex
        "38"^^xsd:int ;
    nif:oliaCategory
         olia:VerbPhrase ;
    nif:oliaConf
        "0.9864510669287669"^^xsd:double ;
    nif:referenceContext
        content:char=0 ;
    nif:superString
        content:char=0,56 .

content:char=32,38
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "ancer can "@en ;
    nif:anchorOf
        "detect"@en ;
    nif:after
        " entities "@en ;
    nif:beginIndex
        "32"^^xsd:int ;
    nif:endIndex
        "38"^^xsd:int ;
    nif:nextWord
        content:char=39,47 ;
    nif:oliaCategory
         olia:Infinitive ,  olia:Verb ;
    nif:oliaConf
        "0.9930989756397197"^^xsd:double ;
    nif:posTag
        "VB"^^xsd:string ;
    nif:previousWord
        content:char=28,31 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=28,38 .

Downloads

Project

Archived Docs

The ASF

NIF 2.0 Transformation Engine

Processed Information (Input)

Created RDF

Configuration

Examples