This project has retired. For details please refer to its Attic page.
Apache Stanbol - Apache Stanbol Enhancer

Apache Stanbol Enhancer

The Apache Stanbol Enhancer provides both a RESTful and a Java API that allows a caller to extract features from passed content. In more detail the passed content is processed by Enhancement Engines as defined by the called Enhancement Chain.

Reader should note that this is the technical documentation of the Stanbol Enhancer intended for Developer. For more practical - usage case oriented - introduction to the Stanbol Enhancer as well as other components please have look at the available Usage Scenarios.

Using the Stanbol Enhancer

The figure below provides an overview of the RESTful as well as the Java API provided by the Stanbol Enhancer

Stanbol Enhancer Overview

RESTful API

The content to be analyzed should be sent in a POST request with the mime-type specified in the Content-type header. The parsed content is then processed by the targeted Enhancement Chain. The response will hold the RDF enhancement serialized in the format specified in the Accept header. The following figure visualizes this process.

Enhancing Content with the Stanbol Enhancer

You can test that easily from the command line using the curl command:

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Stanbol enhancer can detect famous cities such as \
            Paris and people such as Bob Marley." \
    http://localhost:8080/enhancer

The RESTful interface also provides parameters that can be used to parse/request additional information. The following example shows a request which answers with the plain/text version extracted from the HTML content passed in the request.

curl -v -X POST -H "Accept: text/plain" \
    -H "Content-type: text/html; charset=UTF-8" \
    --data "<html><body><p>The Stanbol enhancer can detect famous cities \
            such as Paris and people such as Bob Marley.</p></body></html>" \
    "http://localhost:8080/enhancer/chain/language?omitMetadata=true"

For detailed information please see the documentation of the Stanbol Enhancer RESTful Services. A short version is also provided under the REST API link of the Stanbol Web UI (e.g. http://localhost:8080/enhancer assuming that Apache Stanbol runs on localhost:8080).

Java API

The usage of the Java API requires the following OSGI Services

@Reference
EnhancementJobManager jobManager;
@Reference
ChainManager chainManager;

This code snipped shows how to enhance an HTML document

InputStream content; //the content (assuming an HTML document)
String chainName; //the name of the chain or null to use the default
ContentItem contentItem = new InMemoryContentItem(
    IOUtils.toByteArray(content), "text/html; charset=UTF-8");
//get the EnhancementChain
Chain enhancementChain;
if(chainName == null){
    enhancementChain = chainManager.getDefault();
} else {
    enhancementChain = chainManager.getChain(chainName);
}
try { //enhance the content
    jobManager.enhanceContent(contentItem, enhancementChain);
} catch (EnhancementException e) {}

//Get the enhancement Results
MGraph enhancements = contentItem.getMetadata();

After the enhancement process, ContentItems do not only contain the metadata but also other informations such as converted versions of the passed content. The following code snippet shows how to retrieve the text version of the passed HTML content such as created by the Metaxa Engine.

Entry<UriRef,Blob> textContentPart = 
        ContentItemHelper.getBlob(contentItem, 
            Collections.singleton("text/plain"));
Blob testBlob = textContentPart.getValue();
String charset = testBlob.getParameter().get("charset");
String plainText = IOUtils.toString(
    textContentPart.getValue().getStream(),
    charset == null ? "UTF-8" : charset);

Main Interfaces and Utility Classes

Note that the "org.apache.stanbol.enhancer.servicesapi" module also provides a set of "**Helper" utility classes (e.g. ContentItemHelper, EnhancementEngineHelper …). It is highly recommended for users to use the functionality provided by such helpers when working with the according classes of the Stanbol Enhancer.

Enhancement Structure

The enhancement structure for Apache Stanbol is been described here in full. It defines the types and properties used for the resulting metadata graph of the Stanbol Enhancer.

The enhancement structure defines three main types of Annotations:

In addition all annotations created by the Stanbol Enhancer do also provide additional meta information defined by the Enhancement class.

Enhancement Properties

since 0.12.1

Enhancement Properties allow to parametrize the enhancement process of a ContentItem. In contrast to the configuration of Enhancement Engines - that is bound to the component life cycle - enhancement properties can be defined for Enhancement Chain or parsed with single enhancement requests as Query Parameters.

List of Available Enhancement Engines

Apache Stanbol comes with a list of enhancement engines implementations. These engines are supported by the Apache Stanbol community. If you would like to implement your own enhancement engine, you should go on reading this documentation.