This project has retired. For details please refer to its Attic page.
Apache Stanbol - Enhancement Engines and their main features

Enhancement Engines and their main features

This provides an overview about all Enhancement Engine implementations managed by the Apache Stanbol community.

Preprocessing

Natural Language Processing (NLP)

This does contain Engines the process textual content sent to the Stanbol Enhancer

Language Detection

Language detection engines add Language annotations as defined by STANBOL-613 to the metadata of the ContentItem

Sentence Detection

Sentence detection engines add Sentences to the AnalyzedText content part

Tokenizer Engines

The responsibility of Tokenizer Engines is to add Tokens to the AnalyzedText content part

Part of Speech (POS) Tagging

POS tagging engines do add Part-of-Speech annotations to Tokens present in the AnalyzedText content part

Chunk/Phrase detection

Chunker (or Phrase Detection) Engines do add detected Chunks to the AnalyzedText content part. They also annotate added Chunks with the type of the detected phrase

Named Entity Recognition (NER) Engines

NER engines need to write detected Named Entities as 'fise:TextAnnotation's to the metadata of the ContentItem. In addition they may also add NER annotations to Chunks in the AnalyzedText content part

Morphological Analysis

This includes Engines that perform some sort of morphological analyses (e.g. lemmatization)

General NLP processing Engines

Linking / Suggestions

This category covers enhancement engines that suggest Entities for features present in the parsed content. An Entity is an uniquely identified resource. Typically it provides (or links to) further information such as the type, a description (text, pictures, videos …), spatial and/or temporal context, links to other entities … .

Sentiment Analyses

This includes Engines that perform word/chunk level sentiment classifications on the AnalyzedText content part as well as Engines that summarize those lower level annotations to Sentiments for sentences, sections or the whole text. Sentiment summarizations are represented as 'fise:SentimentAnnotation's (TODO: not yet fully specified (see STANBOL-760).

Disambiguation

Enhancement Engines in this category can disambiguate Entities based on contextual information (e.g. if "Apple" in a sentence refers to the fruit or the company). Based on that such engines can adjust existing Entity suggestions or also create new one.

Postprocessing / Other

Post-Processing engines are executed after the Semantic Analysis is done. Typical examples of post-processing tasks are to dereference information about linked entities, re-write enhancements, filter annotations (e.g. based on the confidence ...).

Dereference Entities

This kind of Enhancement Engines are responsible for retrieving additional information about linked Entities. They first query the enhancement results for referenced Entities, second check if an entity can dereferenced and in an third step dereference the entity and add those information to the enhancement results.

Apache Stanbol provide a core implementation of an Entity Dereference Engine that can be extended for different information sources.

Refactor Engines

Others

Deprecated

Enhancement Engines listed below are no longer supported or where replaced by others