This project has retired. For details please refer to its Attic page.
Apache Stanbol - Stanbol Enhancer

Stanbol Enhancer

The Apache Stanbol Enhancer provides both a RESTful and a Java API that allows a caller to extract features from parsed content. In more detail the parsed content is processed by Enhancement Engines as defined by the called Enhancement Chain.

Using the Stanbol Enhancer

The figure below provides an overview of the RESTful as well as the Java API provided by the Stanbol Enhancer

Stanbol Enhancer Overview

RESTful service

The content to be analyzed should be sent in a POST request with the mime-type specified in the Content-type header. The response will hold the RDF enhancement serialized in the format specified in the Accept header:

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Stanbol enhancer can detect famous cities such as \
            Paris and people such as Bob Marley." \
    http://localhost:8080/enhancer

The RESTful interface also provides parameters that can be used to parse/request additional information. The following example shows a request which answers with the plain/text version of the parsed HTML content.

curl -v -X POST -H "Accept: text/plain" \
    -H "Content-type: text/html; charset=UTF-8" \
    --data "<html><body><p>The Stanbol enhancer can detect famous cities \
            such as Paris and people such as Bob Marley.</p></body></html>" \
    "http://localhost:8080/enhancer/chain/language?omitMetadata=true"

For detailed information please see the documentation of the Stanbol Enhancer RESTful Services. A short version is also provided under the REST API link of the Stanbol Web UI (e.g. http://localhost:8080/enhancer assuming that Apache Stanbol runs on localhost:8080).

Java API

The usage of the Java API requires the following OSGI Services

@Reference
EnhancementJobManager jobManager;
@Reference
ChainManager chainManager;

This code snipped shows how to enhance an HTML document

InputStream content; //the content (assuming an HTML document)
String chainName; //the name of the chain or null to use the default
ContentItem contentItem = new InMemoryContentItem(
    IOUtils.toByteArray(content), "text/html; charset=UTF-8");
//get the EnhancementChain
Chain enhancementChain;
if(chainName == null){
    enhancementChain = chainManager.getDefault();
} else {
    enhancementChain = chainManager.getChain(chainName);
}
try { //enhance the content
    jobManager.enhanceContent(contentItem, enhancementChain);
} catch (EnhancementException e) {}

//Get the enhancement Results
MGraph enhancements = contentItem.getMetadata();

After the enhancement process, ContentItems do not only contain the metadata but also other informations such as converted versions of the parsed content. The following code snippet shows how to retrieve the text version of the parsed HTML content such as created by the Metaxa Engine.

Entry<UriRef,Blob> textContentPart = 
        ContentItemHelper.getBlob(contentItem, 
            Collections.singleton("text/plain"));
Blob testBlob = textContentPart.getValue();
String charset = testBlob.getParameter().get("charset");
String plainText = IOUtils.toString(
    textContentPart.getValue().getStream(),
    charset == null ? "UTF-8" : charset);

List of Available Enhancement Engines

Apache Stanbol comes with a list of enhancement engines implementations. These engines are supported by the Apache Stanbol community. If you would like to implement your own enhancement engine, you should go on reading this documentation.

Main Interfaces and Utilities

Note that the "org.apache.stanbol.enhancer.servicesapi" module also provides a set of "**Helper" utility classes (e.g. ContentItemHelper, EnhancementEngineHelper …). It is highly recommended for users to use the functionality provided by such helpers when working with the according classes of the Stanbol Enhancer.

Enhancement Structure

The enhancement structure for Apache Stanbol is been described here in full. It defines the types and properties used for the resulting metadata graph of Apache Stanbol.

Note: The currently used Enhancement Structure was defined before the incubation to Apache. There is a proposal and ongoing discussion to update this structure in the future however the decision was to keep the current Structure until a first Release.

Each enhancement type description which contains the following important properties:

A text annotation type provides metadata for the selected text. This is intended to be used in addition to the enhancement type if an enhancement is based on a part of the content.

The entity annotation type refers to named entities which have been recognized within the content. This type is intended to be used together with the FISE enhancement type.