This project has retired. For details please refer to its Attic page.
Apache Stanbol - Using Apache Stanbol for enhancing textual content

Using Apache Stanbol for enhancing textual content

For enhancing content you simply post plain text content to the Enhancement Engines and you will get back enhancement data. The enhancement process is stateless, so neither your content item, nor the enhancements will be stored.

You can test this via the [web interface of the engines][stan-engines] or from console via

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
--data "The Stanbol enhancer can detect famous cities such as Paris \
and people such as Bob Marley." http://localhost:8080/engines

or by using the text examples delivered with Stanbol.

for file in enhancer/data/text-examples/*.txt;
do
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" -T $file http://localhost:8080/engines;
done

Content items in formats other than plain text can be tested via the [web interface of contenthub][stan-contenthub] or via the console by attaching files. (The Metaxa Engine needs to be activated).

Using the enhancement engines

Apache Stanbol starts with a number of active enhancement engines by default. You can activate or deactivate engines as well as configure them to your needs via the [OSGI administration console][stan-admin].

For the enhancement engines, a workflow for the enhancement process is defined as pre-processing, content-extraction, extraction-enhancement, default and post-processing.

The following pre-processing engines are available:

For content extraction / natural language processing one engine is available:

The extracted items will then be enhanced by a dedicated engine:

Specific additional enhancement engines are:

For post-processing the results of the enhancement engines

Using an index of linked open data locally

To use the pre-configured indexes you can download them from [here][stan-download]. You will get two files for each index:

By copying the zip archive into the "/sling/datafiles" folder before installing the bundle, the data will used during the installation of the bundle automatically. If you provide the file after installing the bundle, you will need to restart the SolrYard installed by the bundle.

The jar can be installed at any OSGI environment running the Apache Stanbol Entityhub. When started it will create and configure:

This bundle does not contain the indexed data but only the configuration for the Solr Index.

If one has not copied the archive beforehand, the ZIP archive will be requested by the Apache Stanbol Data File Provider after installing the Bundle. To install the data you need copy this file to the "/sling/datafiles" folder within the working directory of your Stanbol Server.

Note: {name} denotes to the value you configured for the "name" property within the "indexing.properties" file.

Enhancement Example

The text "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley." with the default configuration of enhancement engines and with a local index of dbpedia entities will result in the following output graph of several Entity Annotations and Text Annotations.

Two of the relevant fragments for "Paris" are listed below in Turtle-Syntax:

Example for Text Annotation

<urn:enhancement-4a2543d8-4d83-43ce-3a33-2924f457c872>
  a       <http://fise.iks-project.eu/ontology/TextAnnotation> , 
          <http://fise.iks-project.eu/ontology/Enhancement> ;

  <http://fise.iks-project.eu/ontology/confidence>
          "0.9322403510215739"^^<http://www.w3.org/2001/XMLSchema#double> ;

  <http://fise.iks-project.eu/ontology/end>
          "59"^^<http://www.w3.org/2001/XMLSchema#int> ;

  <http://fise.iks-project.eu/ontology/extracted-from>
          <urn:content-item-sha1-37c8a8244041cf6113d4ee04b3a04d0a014f6e10> ;

  <http://fise.iks-project.eu/ontology/selected-text>
          "Paris"^^<http://www.w3.org/2001/XMLSchema#string> ;

  <http://fise.iks-project.eu/ontology/selection-context>
          "The Stanbol enhancer can detect famous cities such as 
          Paris and people such as Bob Marley."
          ^^<http://www.w3.org/2001/XMLSchema#string> ;

  <http://fise.iks-project.eu/ontology/start>
          "54"^^<http://www.w3.org/2001/XMLSchema#int> ;

  <http://purl.org/dc/terms/created>
          "2012-02-29T11:18:36.282Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;

  <http://purl.org/dc/terms/creator>
          "org.apache.stanbol.enhancer.engines.opennlp.impl.NEREngineCore"
          ^^<http://www.w3.org/2001/XMLSchema#string> ;

  <http://purl.org/dc/terms/type>
          <http://dbpedia.org/ontology/Place> .

Example for Entity Annotation

<urn:enhancement-b5e71f70-4978-a70b-7111-8d6e31283a58>
  a       <http://fise.iks-project.eu/ontology/EntityAnnotation> , 
          <http://fise.iks-project.eu/ontology/Enhancement> ;

  <http://fise.iks-project.eu/ontology/confidence>
          "1323049.5"^^<http://www.w3.org/2001/XMLSchema#double> ;

  <http://fise.iks-project.eu/ontology/entity-label>
           "Paris"@en ;

  <http://fise.iks-project.eu/ontology/entity-reference>
           <http://dbpedia.org/resource/Paris> ;

  <http://fise.iks-project.eu/ontology/entity-type>
           <http://www.w3.org/2002/07/owl#Thing> , 
           <http://www.opengis.net/gml/_Feature> , 
           <http://dbpedia.org/ontology/Place> , 
           <http://dbpedia.org/ontology/Settlement> , 
           <http://dbpedia.org/ontology/PopulatedPlace> ;

  <http://fise.iks-project.eu/ontology/extracted-from>
           <urn:content-item-sha1-37c8a8244041cf6113d4ee04b3a04d0a014f6e10> ;

  <http://purl.org/dc/terms/created>
           "2012-02-29T11:18:36.320Z"
           ^^<http://www.w3.org/2001/XMLSchema#dateTime> ;

  <http://purl.org/dc/terms/creator>
           "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine"
           ^^<http://www.w3.org/2001/XMLSchema#string> ;

  <http://purl.org/dc/terms/relation>
           <urn:enhancement-4a2543d8-4d83-43ce-3a33-2924f457c872> .