Getting Started
This tutorial targets developers, who want to enrich unstructured textual content with "named entity" tags (locations, persons or organizations such as "Paris", "Barack Obama", "BBC"). Apache Stanbol can provide such enhancements together with links to public (e.g. DBpedia) or private (e.g. an enterprise specific terminology) repositories.
Build and run your Apache Stanbol instance
To build Apache Stanbol from source you need Java 6 and maven 3.0.3 + (version as defined in the pom). You probably need also:
% export MAVEN_OPTS="-Xmx1024M -XX:MaxPermSize=256M"
Fetch the sources from the Apache Stanbol code repository
% svn co http://svn.apache.org/repos/asf/stanbol/trunk stanbol
From the source directory run
% mvn clean install
Run the stable launcher of Apache Stanbol from your local server machine from the your local directory {root}/stanbol/launchers/
with
% java -Xmx1g -jar stable/target/org.apache.stanbol.launchers.stable-{snapshot-version}-SNAPSHOT.jar
Your instance runs within the stanbol/sling/
directory and is accessible at
http://localhost:8080
Post content item, get an enhancement graph
Goto the local HTTP web endpoint
http://localhost:8080/enhancer
This stateless interface allows the caller to submit content to the Apache Stanbol enhancer engines and get the resulting enhancements formatted as RDF at once without storing anything on the server-side.
Simply copy arbitrary english textual content into the input field and get back the enhancements for Bob Marley and Paris together with the enhancement graph. If you want to work with the REST interface directly, you may also post the text with the cURL command below. The resulting enhancement RDF will be in turtle notation.
% curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley." \ http://localhost:8080/enhancer
Configuration
The "default" enhancement chain includes the following, by default active Enhancement Engines:
- one engine for conversions from various document formats to plain text
- one for detection of the language of the text,
- one for named entity extractions from the content item and
- one engine configured to link the extracted entities to DBpedia entities.
You can use the OSGI console (http://{yourdomain}:{port}/) (user/pwd: admin/admin) of your running Stanbol instance to activate and configure additional engines. Additional engines provide support keyword extraction together with a better language support, for geonames, zemanta or opencalais. See the overview of available Apache Stanbol Enhancement Engines.
Another feature of this Apache Stanbol version is to manage and locally cache external entity repositories such as DBpedia as well as the possibility to use custom vocabularies as linking target repositories. Read more about this scenario using custom vocabularies.
Advanced: Explore Apache Stanbol "full" launcher
The full (including experimental) features of Apache Stanbol can be accessed via Apache Stanbol's "full launcher". See the list of all available components and their features.
To start the full launcher, you just have to execute its JAR via the following command:
$ java -Xmx1g -XX:MaxPermSize=256m \ -jar full/target/org.apache.stanbol.launchers.full-{snapshot-version}-SNAPSHOT.jar
To start the full launcher, you just have to execute its WAR via the following commands:
$ export MAVEN_OPTS="-Xmx1g -XX:MaxPermSize=256m" $ cd launchers/full-war $ mvn clean package tomcat7:run
Your instance is then available on localhost:8080/stanbol.