This project has retired. For details please refer to its Attic page.
Apache Stanbol - Enhancement Chains

Enhancement Chains

An Enhancement Chain defines how content parsed to the Stanbol Enhancer is processed. More concretely it defines which EnhancementEngines and in what order are used to process ContentItems. Chains are not responsible for the actual processing of ContentItems. They provide the ExecutionPlan to the EnhancementJobManger that does the actual processing of the ContentItem.

In the RESTful API enhancement chains can be accessed by their name under

http://{host}:{port}/{stanbol-path}/enhancer/chain/{chain-name}

Enhancement requests issued to

http://{host}:{port}/{stanbol-path}/enhancer
http://{host}:{port}/{stanbol-path}/engines

are processed by using the default enhancement chain.

When using the Java API Chains can be looked up as OSGI services. The ChainManager service is designed to ease this by providing an API that allows to access Chains by their name. Because Chains are not responsible to perform the actual execution but only provide the ExecutionPlan one needs to also lookup an EnhancementJobManager instance to enhance a ContentItem.

@Reference
EnhancementJobManager jobManager;

@Reference
ChainManager chainManager;

//enhance a ContentItem ci 
ContentItem ci;
//by using the Chain "demo"
String chainName;
Chain chain = chainManager.getChain(chainName);
if(chain != null){
    jobManager.enhanceContent(ci,chain);
} else {
    //Chain with name "demo" is not active
}
//the enhancement results are now available in the metadata
MGraph enhancementResults = ci.getMetadata();

To enhance a ContentItem with the default chain the "enhanceContent(ContentItem ci)" can be used.

Chain Interface

The Chain interface is very simplistic. It defines just the following three methods:

/** Getter for the name of the Chain */
+ getName() : String
/** Getter for the execution plan */
+ getExecutionPlan() : Graph
/** Getter for the name of the Engines referenced by this Chain */
+ getEngines() : Set<String>
/** Constant for the property used to for the name of the Chain */
+ PROPERTY_NAME : String

Each Chain has an name assigned. This is typically provided by the chain configuration and MUST be set as value to the property "stanbol.enhancer.chain.name" of the service registration. The getter for the name MUST return the same value. Chain implementation will usually get the name by calling

this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);

within the activate method of the Chain. There is also an AbstractChain implementation provided by the servicesapi module of the Stanbol Enhancer that already implements this functionality.

The getEngines method returns the name of all EnhancementEngines referenced by a Chain. Note that this method returns a Set. This method is intended to allow fast access to the referenced engines and does not provide any information about the execution order.

Components that need to know the details about a Chain need to process the ExecutionPlan returned by the getExecutionPlan() method. The ExecutionPlan is represented as an RDF graph following the ExecutionPlan ontology. It formally describes how a ContentItem must be processed by the EnhancementJobManager. For details see the documentation for the ExecutionPlan.

For any Chain implementation it is important that the returned Graph holding the execution plan MUST BE read-only AND final. This means, that a change in the configuration of a Chain MUST NOT change the graph returned by calls to the getExecutionPlan method.

Because the configuration of a Chain might change at any time, the EnhancementJobManager implementation MUST retrieve the execution plan once and then use this instance for the whole enhancement process. Because of the above requirement that the execution plan is stored in a read-only and final Graph this ensures that the plan can not change even for long lasting enhancement processes. Therefore any change to the configuration of a chain will not influence the ongoing enhancement processes.

Enhancement Chain Management

This section describes how Enhancement Chains are managed by the Stanbol Enhancer and how they can be selected/accessed. It also describes how the "default" Chain is determined.

For every Stanbol Enhancer a single Chain MUST BE present. If this is not the case enhance requests MUST throw a ChainException with an according error message. However typically multiple EnhancementChains will be configured.

Chain Name Conflicts

Chains are identified by the value of the "stanbol.enhancer.chain.name" property - the name of the chain. If more than one Chain do use the same name, then the normal OSGI procedure to select the default service is used. This means that

  1. the Chain with the highest "service.ranking" and
  2. the Chain with the lowest "service.id"

will be selected on requests for a given Chain name. Via the RESTful service API there is no possibility to call the other chains for a given name. However the ChainManager interface allows to access all registered services for a given name.

Default Chain

The second important concept of the Chain management is the definition of the "default chain". The default Chain is used for enhancement requests that do not specify a Chain. This is true for requests to the "/engines" and "/enhancer" RESTful services as well as API calls to the "EnhancementJobManager.enhanceContent(ContentItem ci)" method.

The default Chain is determined by the following rules:

  1. the Chain with the name "default". If more than one Chain is present with that name, than the above rules for resolving name conflicts apply. If none,
  2. the Chain with the highest "service.ranking". If several have the same ranking,
  3. the Chain with the lowest "service.id".

If no chain is active a ChainException with an according message MUST BE thrown.

All Stanbol launchers are configured with the Default Chain enabled. This registers itself with the name "default" and the lowest possible service ranking - Integer.MIN_VALUE. This default provides a Chain that considers all currently active EnhancementEngines and sorts them based on their ordering information (see the Calculation of the Execution Plan based on the EnhancementEngine Ordering for details).

ChainManager interface

The ChainManager is the management interface for EnhancementChains that can be used by components to lookup chains based on their name. It also provides a getter for the default chain. There is also an OSGI ServiceTracker like implementation that can be used to track only chains with specific names and to get even notified on any change of such chains.

Chain implementations

The following Chain implementations are included within the default Stanbol Enhancer distribution: