This project has retired. For details please refer to its Attic page.
Apache Stanbol - Configure Apache Stanbol to work with multiple languages

Configure Apache Stanbol to work with multiple languages

The following languages are supported -

Configuration steps

Install your index

In DBpedia, there exist language labels for many entities. In case you want to use an index of your custom vocabulary, first create the index from it and add the index to your stanbol instance. Simply paste the {yourindex}.solr.zip into your {stanbol-root}/sling/datafiles directory and install the respective OSGI bundle at your OSGI admin console.

Make sure, that this index contains language labels in all languages you want to work with and that they are properly indexed.

Build and add the necessary language bundles

To build the language bundles go to "{stanbol-root}/data/" and call

mvn clean install -P opennlp

This enables the profile to build the OpenNLP models for all languages.

After this the bundles are available in the folder

{stanbol-root}/data/opennlp/lang/{language}/target

The naming of the bundles is "org.apache.stanbol.data.opennlp.lang.{language}-*.jar".

Add the bundles via the OSGI admin console in the bundles tab. The language bundles will fetch and install the according OpenNLP models for the languages you want to use.

Activate LangID engine and KeywordLinkingEngine

Go to the admin console and deactivate some of the available engines. Especially the standard NER engine and the Entity Linking Engines should be deactivated, as they do not support multiple languages. At least two engines need to be activated:

Configure the KeywordLinkingEngine

At the OSGI admin console, you can get the most relevant configuration options of the Keyword Linking Engine.

Read the technical description of this Enhancement Engine to learn about more configuration options.

Results

Depending on your linking target dataset - the engine provides you with enhancement suggestions using labels in your chosen language(s). Note: In the actual version of the DBpedia index, the link directs to the english version of the resource.

Examples

This article from October 2011 describes how to deal with multilingual texts.