This project has retired. For details please refer to its Attic page.
Apache Stanbol -

Apache Stanbol OpenNLP integration

OpenNLP is fully integrated with Apache Stanbol. It is also included in the default launcher configuration. While the Full launcher includes all available language models the Stable launcher only includes the models for English

Configuration and Customization

OpenNLP uses model files to provide the statistical models for different languages. Apache Stanbol supports the loading of such models via the DataFileProvider infrastructure. This allows to provide models either by

Stanbol assumes models to follow the following name schemes

In case modles do use different names the model parameter of the according OpenNLP EnhancementEngine must be used to configure the correct model name. See the Engine documentations for details.

Stanbol Enhancer configuration

OpenNLP based NLP Enhancement Engines

Enhancement Chain configurations

OpenNLP supports both the NER based Named Entity Linking as well as the POS tagging based Entity Linking processing chain.

Users that want to process texts by using Named Entity Recognition will end up using Enhancement Chain configurations similar to

tika;optional
langdetect
opennlp-token
opennlp-sentence
opennlp-ner
{your-named-entity-linking}

where {your-named-entity-linking} refers to an instance of the NamedEntityLinkingEngine configured for the users controlled vocabulary. Users can also use multiple NamedEntityLinkingEngines configuration in the same chain. Users that want to use NER models for other types than Persons, Organizations or Places will need to use the CustomNerModelEngine instead of the opennlp-ner engine.

Note that the use of the opennlp-token and opennlp-sentence engine is optional as the opennlp-ner engine will to those steps itself in case tokens and sentences are not yet available. Including those engines explicitly in the chain is only required in cases where custom configurations for the tokenizers and sentence detection engines (e.g. custom OpenNLP models) need to be applied.

A typical Entity Linking enhancement engine based on OpenNLP includes the following engines

tika;optional
langdetect
opennlp-token
opennlp-sentence
opennlp-pos
opennlp-chunker
{your-entitylinking}

where '{your-entitylinking}' will typically be an EntityhubLinkingEngine engine configured for the users controlled vocabulary. Users that need to link against multiple controlled vocabularies can add multiple EntityhubLinkingEngines to the enhancement chain.

Note that the use of the opennlp-token and opennlp-sentence engine is optional as the opennlp-pos engine will to those steps itself in case tokens and sentences are not yet available. Including those engines explicitly in the chain is only required in cases where custom configurations for the tokenizers and sentence detection engines (e.g. custom OpenNLP models) need to be applied.