Enhancement Chains
An Enhancement Chain defines how content parsed to the Stanbol Enhancer is processed. More concretely it defines which EnhancementEngines and in what order are used to process ContentItems. Chains are not responsible for the actual processing of ContentItems. They provide the ExecutionPlan to the EnhancementJobManger that does the actual processing of the ContentItem.
In the RESTful API enhancement chains can be accessed by their name under
http://{host}:{port}/{stanbol-path}/enhancer/chain/{chain-name}
Enhancement requests issued to
http://{host}:{port}/{stanbol-path}/enhancer http://{host}:{port}/{stanbol-path}/engines
are processed by using the default enhancement chain.
When using the Java API Chains can be looked up as OSGI services. The ChainManager service is designed to ease this by providing an API that allows to access Chains by their name. Because Chains are not responsible to perform the actual execution but only provide the ExecutionPlan one needs to also lookup an EnhancementJobManager instance to enhance a ContentItem.
@Reference EnhancementJobManager jobManager; @Reference ChainManager chainManager; //enhance a ContentItem ci ContentItem ci; //by using the Chain "demo" String chainName; Chain chain = chainManager.getChain(chainName); if(chain != null){ jobManager.enhanceContent(ci,chain); } else { //Chain with name "demo" is not active } //the enhancement results are now available in the metadata MGraph enhancementResults = ci.getMetadata();
To enhance a ContentItem with the default chain the "enhanceContent(ContentItem ci)" can be used.
Chain Interface
The Chain interface is very simplistic. It defines just the following three methods:
/** Getter for the name of the Chain */ + getName() : String /** Getter for the execution plan */ + getExecutionPlan() : Graph /** Getter for the name of the Engines referenced by this Chain */ + getEngines() : Set<String> /** Constant for the property used to for the name of the Chain */ + PROPERTY_NAME : String
Each Chain has an name assigned. This is typically provided by the chain configuration and MUST be set as value to the property "stanbol.enhancer.chain.name" of the service registration. The getter for the name MUST return the same value. Chain implementation will usually get the name by calling
this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);
within the activate method of the Chain. There is also an AbstractChain implementation provided by the servicesapi module of the Stanbol Enhancer that already implements this functionality.
The getEngines method returns the name of all EnhancementEngines referenced by a Chain. Note that this method returns a Set. This method is intended to allow fast access to the referenced engines and does not provide any information about the execution order.
Components that need to know the details about a Chain need to process the ExecutionPlan returned by the getExecutionPlan()
method. The ExecutionPlan is represented as an RDF graph following the ExecutionPlan ontology. It formally describes how a ContentItem must be processed by the EnhancementJobManager. For details see the documentation for the ExecutionPlan.
For any Chain implementation it is important that the returned Graph holding the execution plan MUST BE read-only AND final. This means, that a change in the configuration of a Chain MUST NOT change the graph returned by calls to the getExecutionPlan method.
Because the configuration of a Chain might change at any time, the EnhancementJobManager implementation MUST retrieve the execution plan once and then use this instance for the whole enhancement process. Because of the above requirement that the execution plan is stored in a read-only and final Graph this ensures that the plan can not change even for long lasting enhancement processes. Therefore any change to the configuration of a chain will not influence the ongoing enhancement processes.
Enhancement Chain Management
This section describes how Enhancement Chains are managed by the Stanbol Enhancer and how they can be selected/accessed. It also describes how the "default" Chain is determined.
For every Stanbol Enhancer a single Chain MUST BE present. If this is not the case enhance requests MUST throw a ChainException with an according error message. However typically multiple EnhancementChains will be configured.
Chain Name Conflicts
Chains are identified by the value of the "stanbol.enhancer.chain.name" property - the name of the chain. If more than one Chain do use the same name, then the normal OSGI procedure to select the default service is used. This means that
- the Chain with the highest "service.ranking" and
- the Chain with the lowest "service.id"
will be selected on requests for a given Chain name. Via the RESTful service API there is no possibility to call the other chains for a given name. However the ChainManager interface allows to access all registered services for a given name.
Default Chain
The second important concept of the Chain management is the definition of the "default chain". The default Chain is used for enhancement requests that do not specify a Chain. This is true for requests to the "/engines" and "/enhancer" RESTful services as well as API calls to the "EnhancementJobManager.enhanceContent(ContentItem ci)" method.
The default Chain is determined by the following rules:
- the Chain with the name "default". If more than one Chain is present with that name, than the above rules for resolving name conflicts apply. If none,
- the Chain with the highest "service.ranking". If several have the same ranking,
- the Chain with the lowest "service.id".
If no chain is active a ChainException with an according message MUST BE thrown.
All Stanbol launchers are configured with the Default Chain enabled. This registers itself with the name "default" and the lowest possible service ranking - Integer.MIN_VALUE. This default provides a Chain that considers all currently active EnhancementEngines and sorts them based on their ordering information (see the Calculation of the Execution Plan based on the EnhancementEngine Ordering for details).
ChainManager interface
The ChainManager is the management interface for EnhancementChains that can be used by components to lookup chains based on their name. It also provides a getter for the default chain. There is also an OSGI ServiceTracker like implementation that can be used to track only chains with specific names and to get even notified on any change of such chains.
Chain implementations
The following Chain implementations are included within the default Stanbol Enhancer distribution:
- DefaultChain: This implementation includes all currently active EnhancementEngines. If enabled it registers itself under the name "default" with the service ranking Integer.MIN_VALUE. This makes this chain to the default chain as long users do not deactivate it or register an other chain with the name "default".
- ListChain: Implementation that creates the ExecutionPlan by chaining the EnhancementEngines in the exact order as specified by the parsed list. This Chain does not support parallel execution of engines.
- WeightedChain: This Chain implementation takes a List of Engines names as input and uses the "org.apache.stanbol.enhancer.engine.order " metadata provided by such engines to calculate the ExecutionGraph.
- GraphChain: This Chain implementation is based on a ExecutionGraph parsed as configuration.
- SingleEngineChain: An Adapter to execute a single EnhancementEngine within a Chain. This type of Chain will not be registered as OSGI service. Instances will be created on request for single EnhancementEngines and directly parsed to the EnhancementJobManager implementation.