NIF 2.0 Transformation Engine
Typically low level NLP results are not included to the RDF enhancement results. This engine supports the serialization of such results by using the NIF 2.0 (NLP Interchange Format) standard.
Processed Information (Input)
Apache Stanbol manages NLP results by the Analysed Text content part. This ContentPart provides a Java API for accessing those results. This engine reads such information and transformes it according to the NIF 2.0 core ontology. Transformed information will be added as RDF to the Enhancement Metadata and be included in the RDF response of the enhancement request.
If a ContentItem does not contain this content part it will not be processed by this engine.
Created RDF
The engine serializes NLP annotations as defined by the NIF 2.0 core ontology. More specifically the engine is capable of it the following information:
- Segment URIs do use RFC 5147. It can be configured if the
nif:RFC5147String
type is only added to thenif:Context
instance or to all serializednif:String
instances. - Selector information like
nif:beginIndex
,nif:endIndex
as well asnif:before
,nif:anchorOf
andnif:after
. For spans longer as 100 chars thenif:head
property is used instead ofnif:anchorOf
. Their is an option to prevent those features to be serialized. This will greatly decrease the triple count however clients will need to parse the start/end positions from the segment URI. - All serialized
nif:String
instances do refer thenif:Context
with thenif:referenceContext
. The context will refer to the URI of the ContentItem by using thenif:sourceUrl
property. The inclusion of the content as String literal is NOT supported by this engine. - String hierarchies: This includes
nif:subWord
nif:superWord
andnif:sentence
properties. If not required serializing of those can be deactivated. - String navigation: This includes
nif:nextSentence
,nif:previousSentnece
,nif:nextWord
andnif:previousWord
properties. The transitive versions of those properties are NOT supported. Users that want to have transitive reasoning will anyway get those from the reasoner. String navigation properties can be deactivated. This will greatly decrease the triple count. - String annotations: This currently includes
nif:oliaCategory
,nif:oliaConfidence
andnif:posTag
.nif:oliaLink
is not supported as the Stanbol NLP API does not provide the required information. Also support for word level sentiment annotations is not yet implemented.
Configuration
The Engine supports several switches that allow to enable/disable the serialization of NIF information. The engine supports the configuration of multiple instances with different configurations. The following figure shows the configuration dialog:
- Selector (enhancer.engines.nlp2rdf.selector): Allows to enable/disable the serialization of selector related properties such as
nif:beginIndex
,nif:endIndex
,nif:before
,nif:anchorOf
andnif:after
. If disabled clients can still parse the start/end indexes from the RFC 5147 encoded segment URI. - Hierarchy (enhancer.engines.nlp2rdf.hierarchy): Switch that allows to enable/disable writing of hierarchical links. This includes
olia:sentence
,olia:superString
andolia:subString
properties. - Previous and Next Links (enhancer.engines.nlp2rdf.previousNext): Allows to enable/disable the serialization of links to the previous/next sentence/word
- Context only URI Scheme (enhancer.engines.nlp2rdf.cotextOnlyUriScheme): If enabled the used RFC 5147 URI scheme is added only to the
rdf:type
of thenif:Context
. If disabled thenif:RFC5147String
rdf:type
is added to all segments. - String Type (enhancer.engines.nlp2rdf.writeStringType): If enabled the
nif:String
type is added to all serialized segments. If disabled only more specific types likenif:Sentence
ornif:Word
are used.
Examples
This sections provides some examples of RDF generated by this Engine. OpenNLP was used to create the serialized NLP annotation. The Sentence The Apache Stanbol Enhancer can detect entities in text
was used for generating this example.
@prefix content <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> . @prefix nif <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix olia <http://purl.org/olia/olia.owl#> . @prefix xsd <http://www.w3.org/2001/XMLSchema#> .
The first Turtle snippet shows the nif:Context
instance. This is referenced by all segments and it will refer to the URI of the ContentItem by using the nif:sourceUrl
.
content:char=0 a nif:Context , nif:RFC5147String ; nif:anchorOf "The Apache Stanbol Enhancer can detect entities in text."@en ; nif:beginIndex "0"^^xsd:int ; nif:endIndex "56"^^xsd:int ; nif:sourceUrl <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
Next the segment describing the only sentence in the example text. NOTE: if nif:before
or nif:after
are empty strings it indicates that the section start/ends at the beginning/end of the parsed content.
content:char=0,56 a nif:RFC5147String , nif:Sentence ; nif:before ""@en ; nif:anchorOf "The Apache Stanbol Enhancer can detect entities in text."@en ; nif:after ""@en ; nif:beginIndex "0"^^xsd:int ; nif:endIndex "56"^^xsd:int ; nif:firstWord content:char=0,3 ; nif:referenceContext content:char=0 .
The following snippet shows the segments for the first three words of the Sentence.
content:char=0,3 a nif:RFC5147String , nif:Word ; nif:before ""@en ; nif:anchorOf "The"@en ; nif:after " Apache St"@en ; nif:beginIndex "0"^^xsd:int ; nif:endIndex "3"^^xsd:int ; nif:nextWord content:char=4,10 ; nif:oliaCategory olia:Determiner , olia:PronounOrDeterminer ; nif:oliaConf "0.9662179110607207"^^xsd:double ; nif:posTag "DT"^^xsd:string ; nif:referenceContext content:char=0 ; nif:sentence content:char=0,56 ; nif:subString content:char=0,10 . content:char=4,10 a nif:RFC5147String , nif:Word ; nif:before "The "@en ;nif:anchorOf nif:anchorOf "Apache"@en ; nif:after " Stanbol E"@en ; nif:beginIndex "4"^^xsd:int ; nif:endIndex "10"^^xsd:int ; nif:nextWord content:char=11,18 ; nif:oliaCategory olia:Noun , olia:PluralQuantifier , olia:ProperNoun , olia:Quantifier ; nif:oliaConf "0.7882547205652428"^^xsd:double ; nif:posTag "NNPS"^^xsd:string ; nif:previousWord content:char=0,3 ; nif:referenceContext content:char=0 ; nif:sentence content:char=0,56 ; nif:subString content:char=0,10 . content:char=11,18 a nif:RFC5147String , nif:Word ; nif:before "he Apache "@en ; nif:anchorOf "Stanbol"@en ; nif:after " Enhancer "@en ; nif:beginIndex "11"^^xsd:int ; nif:endIndex "18"^^xsd:int ; nif:nextWord content:char=19,27 ; nif:oliaCategory olia:Noun , olia:ProperNoun , olia:Quantifier , olia:SingularQuantifier ; nif:oliaConf "0.701014272348203"^^xsd:double ; nif:posTag "NNP"^^xsd:string ; nif:previousWord content:char=4,10 ; nif:referenceContext content:char=0 ; nif:sentence content:char=0,56 ; nif:subString content:char=11,27 .
Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the included the segment for the verb that links to the phrase using nif:subString
.
content:char=28,38 a nif:Phrase , nif:RFC5147String ; nif:before " Enhancer "@en ; nif:anchorOf "can detect"@en ; nif:after " entities "@en ; nif:beginIndex "28"^^xsd:int ; nif:endIndex "38"^^xsd:int ; nif:oliaCategory olia:VerbPhrase ; nif:oliaConf "0.9864510669287669"^^xsd:double ; nif:referenceContext content:char=0 ; nif:superString content:char=0,56 . content:char=32,38 a nif:RFC5147String , nif:Word ; nif:before "ancer can "@en ; nif:anchorOf "detect"@en ; nif:after " entities "@en ; nif:beginIndex "32"^^xsd:int ; nif:endIndex "38"^^xsd:int ; nif:nextWord content:char=39,47 ; nif:oliaCategory olia:Infinitive , olia:Verb ; nif:oliaConf "0.9930989756397197"^^xsd:double ; nif:posTag "VB"^^xsd:string ; nif:previousWord content:char=28,31 ; nif:referenceContext content:char=0 ; nif:sentence content:char=0,56 ; nif:subString content:char=28,38 .