This project has retired. For details please refer to its Attic page.
Apache Stanbol - NIF 2.0 Transformation Engine

NIF 2.0 Transformation Engine

Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the NIF 2.0 (NLP Interchange Format) standard.

Processed Information (Input)

Apache Stanbol manages NLP results by the Analysed Text content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the NIF 2.0 core ontology. Transformed information will be added as RDF to the Enhancement Metadata and be included in the RDF response of the enhancement request.

If a ContentItem does not contain this content part it will not be processed by this engine.

Created RDF

The engine serializes NLP annotations as defined by the NIF 2.0 core ontology. More specifically the engine is capable of it the following information:

Configuration

The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:

NIF2.0 Engine Configuration

Examples

This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence The Apache Stanbol Enhancer can detect entities in text was used for generating this example.

@prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
@prefix nif  <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix olia  <http://purl.org/olia/olia.owl#> .
@prefix  xsd  <http://www.w3.org/2001/XMLSchema#> .

The first Turtle snippet shows the nif:Context instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the nif:sourceUrl.

content:char=0
    a nif:Context ,  nif:RFC5147String ;
    nif:anchorOf
        "The Apache Stanbol Enhancer can detect entities in text."@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "56"^^xsd:int ;
    nif:sourceUrl
        <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .

Next the segment describing the only sentence in the example text. NOTE: if nif:before or nif:after are empty strings it indicates that the section start/ends at the beginning/end of the parsed content.

content:char=0,56
    a nif:RFC5147String ,  nif:Sentence ;
    nif:before
        ""@en ;
    nif:anchorOf
        "The Apache Stanbol Enhancer can detect entities in text."@en ;
    nif:after
        ""@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "56"^^xsd:int ;
    nif:firstWord
        content:char=0,3 ;
    nif:referenceContext
        content:char=0 .

The following snippet shows the segments for the first three words of the Sentence.

content:char=0,3
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        ""@en ;
    nif:anchorOf
        "The"@en ;
    nif:after
        " Apache St"@en ;
    nif:beginIndex
        "0"^^xsd:int ;
    nif:endIndex
        "3"^^xsd:int ;
    nif:nextWord
        content:char=4,10 ;
    nif:oliaCategory
         olia:Determiner ,  olia:PronounOrDeterminer ;
    nif:oliaConf
        "0.9662179110607207"^^xsd:double ;
    nif:posTag
        "DT"^^xsd:string ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=0,10 .

content:char=4,10
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "The "@en ;nif:anchorOf
    nif:anchorOf
        "Apache"@en ;
    nif:after
        " Stanbol E"@en ;
    nif:beginIndex
        "4"^^xsd:int ;
    nif:endIndex
        "10"^^xsd:int ;
    nif:nextWord
        content:char=11,18 ;
    nif:oliaCategory
         olia:Noun ,  olia:PluralQuantifier ,  olia:ProperNoun ,  olia:Quantifier ;
    nif:oliaConf
        "0.7882547205652428"^^xsd:double ;
    nif:posTag
        "NNPS"^^xsd:string ;
    nif:previousWord
        content:char=0,3 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=0,10 .

content:char=11,18
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "he Apache "@en ;
    nif:anchorOf
        "Stanbol"@en ;
    nif:after
        " Enhancer "@en ;
    nif:beginIndex
        "11"^^xsd:int ;
    nif:endIndex
        "18"^^xsd:int ;
    nif:nextWord
        content:char=19,27 ;
    nif:oliaCategory
         olia:Noun ,  olia:ProperNoun ,  olia:Quantifier ,  olia:SingularQuantifier ;
    nif:oliaConf
        "0.701014272348203"^^xsd:double ;
    nif:posTag
        "NNP"^^xsd:string ;
    nif:previousWord
        content:char=4,10 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=11,27 .

Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using nif:subString.

content:char=28,38
    a nif:Phrase ,  nif:RFC5147String ;
    nif:before
        " Enhancer "@en ;
    nif:anchorOf
        "can detect"@en ;
    nif:after
        " entities "@en ;
    nif:beginIndex
        "28"^^xsd:int ;
    nif:endIndex
        "38"^^xsd:int ;
    nif:oliaCategory
         olia:VerbPhrase ;
    nif:oliaConf
        "0.9864510669287669"^^xsd:double ;
    nif:referenceContext
        content:char=0 ;
    nif:superString
        content:char=0,56 .

content:char=32,38
    a nif:RFC5147String ,  nif:Word ;
    nif:before
        "ancer can "@en ;
    nif:anchorOf
        "detect"@en ;
    nif:after
        " entities "@en ;
    nif:beginIndex
        "32"^^xsd:int ;
    nif:endIndex
        "38"^^xsd:int ;
    nif:nextWord
        content:char=39,47 ;
    nif:oliaCategory
         olia:Infinitive ,  olia:Verb ;
    nif:oliaConf
        "0.9930989756397197"^^xsd:double ;
    nif:posTag
        "VB"^^xsd:string ;
    nif:previousWord
        content:char=28,31 ;
    nif:referenceContext
        content:char=0 ;
    nif:sentence
        content:char=0,56 ;
    nif:subString
        content:char=28,38 .