Stanbol Enhancer Natural Language Processing Support

NOTE: The NLP processing module for the Apache Stanbol Enhancer was introduced in STANBOL-733 and is only available in Apache Stanbol Enhancer versions starting from 0.10.0.

Overview:

This section covers the following topics:

Additional Information can be found in the usage scenario about working with multiple languages

Stanbol Natural Language Processing

The natural language processing module of the Stanbol Enhancer supports the usage of the following NLP processing techniques:

Based on those techniques Stanbol supports two text enhancement processes described in the following two sub sections.

Named Entity Linking

This chain is based on named entity recognition (NER) by linking recognized entities with controlled vocabularies. A typical enhancement chain contains the following type of engines:

Entity Linking

This chain is based on part of speech, chunking and lematization analysis. It uses those results to lookup words in a configured controlled vocabulary. A typical enhacement chain contains the following type of engines:

Additional information on how to configure the Stanbol in multilingual environments are given by the usage scenarios on working with multiple languages.

NLP processing API

The intention of the Stanbol NLP processing API is to efficiently handle word level NLP processing annotations. Something that was not possible by using the RDF metadata of the contentItem. Instead of RDF the NLP processing API defines a JAVA API that consists of the following two main parts:

The NLP processing module also provides a default in-memory implementation of all defined interfaces. This implementation is used as default by the Stanbol Enhancer.

Additionally, the NLP processing module provides:

Stanbol Enhancer NLP Support

This section provides an overview about the currently integrated NLP frameworks and their supported languages.

Integrated NLP frameworks

Supported Languages