Weighted Chain
The WeightedChain takes a list of EnhancementEngine names as input and uses the "org.apache.stanbol.enhancer.engine.order" metadata of the configured Engines to calculate an ExecutionPlan.
This chain is designed for easy configuration - just a list of the engine names - but has limited possibilities to control the execution order.
Configuration
The property "stanbol.enhancer.chain.weighted.chain" is used to provide the list of engine names. Both arrays and collections are supported as values.
In addition it is possible to define engines as optional. This allows to specify that the enhancement process should not fail if an engine is not active or fails while processing a content item.
The syntax to define an Engine as optional is as follows (Both variants make the execution of the engine with the name
<name>;optional <name>;optional=true
The following figure shows the configuration dialog of a WeightedCahin configured with two required and an optional engine.
Enhancement Properties Support
since 0.12.1
Starting from 0.12.1
the Weighted Chain allows to configure EnhancementProperties
-
chain and engine scoped properties are defined as parameters to the engines with the syntax
{engine-name}; {property-name-1}={value-1},{value-2}; {property-name-2}={value-1};
-
chain scoped properties can be configured by using the osgi property key
stanbol.enhancer.chain.chainproperties
by the syntax{property-name-1}={value-1},{value-2}
. NOTE that;
is NOT supported as separator for parsing multiple properties as OSGI configurations already define a way for parsing multiple values
All EnhancementProperties configured with a Chain are written as RDF to the ExecutionPlan. Chain scoped properties are directly added to the ep:ExecutionPlan
instance while chain and engine scoped properties are added to the ep:ExecutionNode
of the according engine.
The following figure and listing provide an example
The figure shows that for the dbpedia-fst
engine the maximum number of suggestions are set to 10
. Also the minimum confidence value is set to 0.8
. For the dbpedia-dereference
engine the dereferenced languages are set to English, German and Spanish. Finally a chain scoped property is used to set the maximum number of suggestions for the whole chain to 5
. However this has no effect for the dbpedia-fst
engine as its custom configuration will override this chain wide property.
The following listing shows the exact same configuration in the .cfg
format.
stanbol.enhancer.chain.name="dbpedia-linking" stanbol.enhancer.chain.weighted.chain=["tika;optional","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-chunker", "dbpedia-fst;\ enhancer.max-suggestions\=10;\ enhancer.min-confidence\=0.8", "dbpedia-dereference;\ enhancer.engines.dereference.languages\=en,de,es"] stanbol.enhancer.chain.chainproperties=["enhancer.max-suggestions\=5"]
Calculation of the ExecutionPlan
It is important to note that the ordering of the list has no influence on the ExecutionPlan because the order of execution of the configured EnhancementEngines is calculated only by using the value of the "org.apache.stanbol.enhancer.engine.order" property provided by the EnhancementEngine:
- Engines with a lower order are executed before engines with a higher value
- Engines with the same order may be executed simultaneously if the EnhancementJobManager and the EnhancementEngine do support this feature.
The WeightedCahin follows exactly the same algorithm as the WeightedJobManager used to decide the execution order of all active EnhancementEngines. However the WeightedChain will only consider configured chains and ignore others.
The following image shows the ExecutionPlan as calculated based on the above configuration.
If some of the Enhancement Engines are not available this will be visualized as follows. If you parse content by using the RESTful interface similar information will be available via the the Execution Metadata included in the metadata of the enhanced content item.
This shows that the optional engine 'metaxa' is currently not available. The chain can be still used however the functionality provided by this optional engine will not be available. In this case only requests for plain text files could be processed.
The next figure shows a situation where a required engine is not active. Requests to this chain will fail until all required engines are active.