This project has retired. For details please refer to its Attic page.
Apache Stanbol - Weighted Chain

Weighted Chain

The WeightedChain takes a list of EnhancementEngine names as input and uses the "org.apache.stanbol.enhancer.engine.order" metadata of the configured Engines to calculate an ExecutionPlan.

This chain is designed for easy configuration - just a list of the engine names - but has limited possibilities to control the execution order.

Configuration

The property "stanbol.enhancer.chain.weighted.chain" is used to provide the list of engine names. Both arrays and collections are supported as values.

In addition it is possible to define engines as optional. This allows to specify that the enhancement process should not fail if an engine is not active or fails while processing a content item.

The syntax to define an Engine as optional is as follows (Both variants make the execution of the engine with the name optional.):

<name>;optional
<name>;optional=true

The following figure shows the configuration dialog of a WeightedCahin configured with two required and an optional engine.

Configuration dialog for the WeightedCahin

Enhancement Properties Support

since 0.12.1

Starting from 0.12.1 the Weighted Chain allows to configure EnhancementProperties

All EnhancementProperties configured with a Chain are written as RDF to the ExecutionPlan. Chain scoped properties are directly added to the ep:ExecutionPlan instance while chain and engine scoped properties are added to the ep:ExecutionNode of the according engine.

The following figure and listing provide an example

WeightedChain including some Enhancement Properties

The figure shows that for the dbpedia-fst engine the maximum number of suggestions are set to 10. Also the minimum confidence value is set to 0.8. For the dbpedia-dereference engine the dereferenced languages are set to English, German and Spanish. Finally a chain scoped property is used to set the maximum number of suggestions for the whole chain to 5. However this has no effect for the dbpedia-fst engine as its custom configuration will override this chain wide property.

The following listing shows the exact same configuration in the .cfg format.

stanbol.enhancer.chain.name="dbpedia-linking"
stanbol.enhancer.chain.weighted.chain=["tika;optional","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-chunker",
    "dbpedia-fst;\ enhancer.max-suggestions\=10;\ enhancer.min-confidence\=0.8",
    "dbpedia-dereference;\ enhancer.engines.dereference.languages\=en,de,es"]
stanbol.enhancer.chain.chainproperties=["enhancer.max-suggestions\=5"]

Calculation of the ExecutionPlan

It is important to note that the ordering of the list has no influence on the ExecutionPlan because the order of execution of the configured EnhancementEngines is calculated only by using the value of the "org.apache.stanbol.enhancer.engine.order" property provided by the EnhancementEngine:

The WeightedCahin follows exactly the same algorithm as the WeightedJobManager used to decide the execution order of all active EnhancementEngines. However the WeightedChain will only consider configured chains and ignore others.

The following image shows the ExecutionPlan as calculated based on the above configuration.

ExecutionPlan for the keyword chain

If some of the Enhancement Engines are not available this will be visualized as follows. If you parse content by using the RESTful interface similar information will be available via the the Execution Metadata included in the metadata of the enhanced content item.

Optional Engine is inactive

This shows that the optional engine 'metaxa' is currently not available. The chain can be still used however the functionality provided by this optional engine will not be available. In this case only requests for plain text files could be processed.

The next figure shows a situation where a required engine is not active. Requests to this chain will fail until all required engines are active.

Required Engine is inactive