NIF 2.0 Transformation Engine
Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the NIF 2.0 (NLP Interchange Format) standard.
Processed Information (Input)
Apache Stanbol manages NLP results by the Analysed Text content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the NIF 2.0 core ontology. Transformed information will be added as RDF to the Enhancement Metadata and be included in the RDF response of the enhancement request.
If a ContentItem does not contain this content part it will not be processed by this engine.
Created RDF
The engine serializes NLP annotations as defined by the NIF 2.0 core ontology. More specifically the engine is capable of it the following information:
- Segment URIs do use RFC 5147. It can be configured if the
nif:RFC5147Stringtype is only added to thenif:Contextinstance or to all serializednif:Stringinstances. - Selector information like
nif:beginIndex,nif:endIndexas well asnif:before,nif:anchorOfandnif:after. For spans longer as 100 chars thenif:headproperty is used instead ofnif:anchorOf. Their is an option to prevent those features to be serialized. This will greatly decrease the triple count however clients will need to parse the start/end positions from the segment URI. - All serialized
nif:Stringinstances do refer thenif:Contextwith thenif:referenceContext. The context will refer to the URI of the ContentItem by using thenif:sourceUrlproperty. The inclusion of the content as String literal is NOT supported by this engine. - String hierarchies: This includes
nif:subWordnif:superWordandnif:sentenceproperties. If not required serializing of those can be deactivated. - String navigation: This includes
nif:nextSentence,nif:previousSentnece,nif:nextWordandnif:previousWordproperties. The transitive versions of those properties are NOT supported. Users that want to have transitive reasoning will anyway get those from the reasoner. String navigation properties can be deactivated. This will greatly decrease the triple count. - String annotations: This currently includes
nif:oliaCategory,nif:oliaConfidenceandnif:posTag.nif:oliaLinkis not supported as the Stanbol NLP API does not provide the required information. Also support for word level sentiment annotations is not yet implemented.
Configuration
The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:

- Selector (enhancer.engines.nlp2rdf.selector): Allows to enable/disable the serialization of selector related properties such as
nif:beginIndex,nif:endIndex,nif:before,nif:anchorOfandnif:after. If disabled clients can still parse the start/end indexes from the RFC 5147 encoded segment URI. - Hierarchy (enhancer.engines.nlp2rdf.hierarchy): Switch that allows to enable/disable writing of hierarchical links. This includes
olia:sentence,olia:superStringandolia:subStringproperties. - Previous and Next Links (enhancer.engines.nlp2rdf.previousNext): Allows to enable/disable the serialization of links to the previous/next sentence/word
- Context only URI Scheme (enhancer.engines.nlp2rdf.cotextOnlyUriScheme): If enabled the used RFC 5147 URI scheme is added only to the
rdf:typeof thenif:Context. If disabled thenif:RFC5147Stringrdf:typeis added to all segments. - String Type (enhancer.engines.nlp2rdf.writeStringType): If enabled the
nif:Stringtype is added to all serialized segments. If disabled only more specific types likenif:Sentenceornif:Wordare used.
Examples
This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence The Apache Stanbol Enhancer can detect entities in text was used for generating this example.
@prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> . @prefix nif <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix olia <http://purl.org/olia/olia.owl#> . @prefix xsd <http://www.w3.org/2001/XMLSchema#> .
The first Turtle snippet shows the nif:Context instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the nif:sourceUrl.
content:char=0
a nif:Context , nif:RFC5147String ;
nif:anchorOf
"The Apache Stanbol Enhancer can detect entities in text."@en ;
nif:beginIndex
"0"^^xsd:int ;
nif:endIndex
"56"^^xsd:int ;
nif:sourceUrl
<urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
Next the segment describing the only sentence in the example text. NOTE: if nif:before or nif:after are empty strings it indicates that the section start/ends at the beginning/end of the parsed content.
content:char=0,56
a nif:RFC5147String , nif:Sentence ;
nif:before
""@en ;
nif:anchorOf
"The Apache Stanbol Enhancer can detect entities in text."@en ;
nif:after
""@en ;
nif:beginIndex
"0"^^xsd:int ;
nif:endIndex
"56"^^xsd:int ;
nif:firstWord
content:char=0,3 ;
nif:referenceContext
content:char=0 .
The following snippet shows the segments for the first three words of the Sentence.
content:char=0,3
a nif:RFC5147String , nif:Word ;
nif:before
""@en ;
nif:anchorOf
"The"@en ;
nif:after
" Apache St"@en ;
nif:beginIndex
"0"^^xsd:int ;
nif:endIndex
"3"^^xsd:int ;
nif:nextWord
content:char=4,10 ;
nif:oliaCategory
olia:Determiner , olia:PronounOrDeterminer ;
nif:oliaConf
"0.9662179110607207"^^xsd:double ;
nif:posTag
"DT"^^xsd:string ;
nif:referenceContext
content:char=0 ;
nif:sentence
content:char=0,56 ;
nif:subString
content:char=0,10 .
content:char=4,10
a nif:RFC5147String , nif:Word ;
nif:before
"The "@en ;nif:anchorOf
nif:anchorOf
"Apache"@en ;
nif:after
" Stanbol E"@en ;
nif:beginIndex
"4"^^xsd:int ;
nif:endIndex
"10"^^xsd:int ;
nif:nextWord
content:char=11,18 ;
nif:oliaCategory
olia:Noun , olia:PluralQuantifier , olia:ProperNoun , olia:Quantifier ;
nif:oliaConf
"0.7882547205652428"^^xsd:double ;
nif:posTag
"NNPS"^^xsd:string ;
nif:previousWord
content:char=0,3 ;
nif:referenceContext
content:char=0 ;
nif:sentence
content:char=0,56 ;
nif:subString
content:char=0,10 .
content:char=11,18
a nif:RFC5147String , nif:Word ;
nif:before
"he Apache "@en ;
nif:anchorOf
"Stanbol"@en ;
nif:after
" Enhancer "@en ;
nif:beginIndex
"11"^^xsd:int ;
nif:endIndex
"18"^^xsd:int ;
nif:nextWord
content:char=19,27 ;
nif:oliaCategory
olia:Noun , olia:ProperNoun , olia:Quantifier , olia:SingularQuantifier ;
nif:oliaConf
"0.701014272348203"^^xsd:double ;
nif:posTag
"NNP"^^xsd:string ;
nif:previousWord
content:char=4,10 ;
nif:referenceContext
content:char=0 ;
nif:sentence
content:char=0,56 ;
nif:subString
content:char=11,27 .
Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using nif:subString.
content:char=28,38
a nif:Phrase , nif:RFC5147String ;
nif:before
" Enhancer "@en ;
nif:anchorOf
"can detect"@en ;
nif:after
" entities "@en ;
nif:beginIndex
"28"^^xsd:int ;
nif:endIndex
"38"^^xsd:int ;
nif:oliaCategory
olia:VerbPhrase ;
nif:oliaConf
"0.9864510669287669"^^xsd:double ;
nif:referenceContext
content:char=0 ;
nif:superString
content:char=0,56 .
content:char=32,38
a nif:RFC5147String , nif:Word ;
nif:before
"ancer can "@en ;
nif:anchorOf
"detect"@en ;
nif:after
" entities "@en ;
nif:beginIndex
"32"^^xsd:int ;
nif:endIndex
"38"^^xsd:int ;
nif:nextWord
content:char=39,47 ;
nif:oliaCategory
olia:Infinitive , olia:Verb ;
nif:oliaConf
"0.9930989756397197"^^xsd:double ;
nif:posTag
"VB"^^xsd:string ;
nif:previousWord
content:char=28,31 ;
nif:referenceContext
content:char=0 ;
nif:sentence
content:char=0,56 ;
nif:subString
content:char=28,38 .

