Enhancement Engines and their main features
This provides an overview about all Enhancement Engine implementations managed by the Apache Stanbol community.
Preprocessing
-
Language Identification Engine
- language detection for textual content utilizing Apache Tika
-
Tika Engine (based on Apache Tika)
- content type detection
- text extraction from various document formats
- extraction of metadata from document formats
-
- text extraction from various document formats
- extraction of metadata from document formats
Natural Language Processing
-
Named Entity Extraction Enhancement Engine
- NLP processing using OpenNLP NER
- detects occurrences of persons, places and organizations only
-
- NLP processing using OpenNLP
- supports multiple languages
- detects occurrences of untyped entities as concepts, takes local taxonomies as linking target
Linking Suggestions
-
- suggest links to several Linked Data Sources (e.g. DBpedia)
-
- suggests links to geonames.org
- provides hierarchical links for locations
-
- integrates service from Open Calais. (Note: You need to provide a key in order to use this engine)
-
- integrates the Zemanta services. (Note: You need to provide a key in order to use this engine)
Postprocessing / Other
-
CachingDereferencerEngine (deprecated, see dereferencing support of individual engines as well as STANBOL-336)
- retrieves additional content for presenting the enhancement results.
-
- transforms enhancements according to a target ontology, requires KRES launcher.