This project has retired. For details please refer to its Attic page.
Apache Stanbol - Enhancement Properties

Enhancement Properties

since version 0.12.1 with STANBOL-488

Enhancement Properties allow to parametrize the enhancement process of a ContentItem. In contrast to the configuration of Enhancement Engines - that is bound to the component life cycle - enhancement properties can be defined for Enhancement Chain and single enhancement requests.

Naming and definition

EnhancementProperties are defined as string keys similar to Java properties. To represent them in RDF the key is transformed to an URI by using the string key as local name and http://stanbol.apache.org/ontology/enhancementproperties# as namespace. The default namespace prefix for enhancement properties is ehp.

The String key of Enhancement Properties MUST start with enhancer. and SHOULD use the enhancer.{level-1}.{level-2}.{property-name} syntax. Properties are case sensitive and SHOULD only use lower case characters. The '-' char shall be used to make properties with multiple names easier to read.

Globally defined properties use 'enhancer.{property-name}'. Enhancement Engine specific properties a possible shorted/simplified name of the engine should be used as {level-1}. Engine specific properties might also use engines as {level-1} and the name of the engine as {level-2}.

Examples: enhancer.max-suggestions or enhancer.min-confidence are typical examples for globally defined Enhancement Properties. Properties defined by specific Enhancement Engines will look like enhancer.entity-co-mention.adjust-existing-confidence or enhancer.engines.dereference.fields (as defined by STANBOL-1287).

Enhancement Properties can also be defined as RDF datatype properties. This allows to specify the expected XSD data type of expected values.

@prefix ehp <http://stanbol.apache.org/ontology/enhancementproperties#> .

ehp:enhancer.max-suggestions a rdfs:DatatypeProperty ;
    xsd:datatype xsd:Integer .

ehp:enhancer.min-confidence a rdfs:DatatypeProperty ;
    xsd:datatype xsd:Double .

ehp:enhancer.entity-co-mention.adjust-existing-confidence a rdfs:DatatypeProperty ;
    xsd:datatype xsd:Double .

ehp:enhancer.engines.dereference.fields a rdfs:DatatypeProperty ;
    xsd:datatype xsd:String .

NOTE that the Java Interface will parse enhancement properties as Map<String,Object>. Regardless of the defined data type Enhancement Engines that support a property MUST support to parse values from string values (the lexical form of the RDF literal). Multiple values may be parsed as Java Collection or an Array.

Scopes

Enhancement Properties can be defined with the following scopes

  1. request and engine: Properties with this scope are applied for a single request and a specific Enhancement Engine part of the executed Enhancement Chain. They do have the highest priority and will therefore override properties defined with any of the below scopes.
  2. request: Properties valid for a single request that are parsed to every Enhancement Engine part of the executed Enhancement Chain.
  3. chain and engine: Properties defined for a specific Enhancement Engine of an Enhancement Chain. As all chain scoped properties, those get applied to all executions of that chain.
  4. chain: Chain specific properties parsed to all Enhancement Engines of the Enhancement Chain. Enhancement Properties of this scope do have the lowest priority and will be overridden by any property with the same key and one of the above scopes.

Properties with a higher priority will override properties with an lower priority. Meaning if a property enhancer.min-confidence=0.5 is defined on a chain scope it can be overridden by enhancer.min-confidence=0.75 on a chain and engine scope. A single request might still override the value on a request or request and engine scope.

Chain and/or chain and engine scoped properties are configured with Enhancement Chain definition. Request and/or request and engine scoped properties can be specified as query parameter of the POST request or via the Java API by accessing the Request Properties content part. See the following sections for detailed information.

Using Enhancement Properties

Enhancement Properties are consumed by Enhancement Engines. This section describes how implementors of engines can retrieve Enhancement Properties from the request - calls to the computeEnhancements(..) method.

In version 0.12.1 and 1.* EnhancementProperties are contained in the ContentItem parsed to the EnhancementEngine. The EnhancementEngineHeloer utility has methods to access them. The following listing shows the code necessary to get the Enhancement Properties from the parsed ContentItem.

@Override
public final void computeEnhancements(ContentItem ci) throws EngineException {
    Map<String,Object> enhancemntProps = EnhancementEngineHelper.getEnhancementProperties(this, ci);
    [..]
}

With 2.0.0 the EnhancementEngine API will be changed so that the EnhancementProperties are parsed as an additional parameter.

@Override
public final void computeEnhancements(ContentItem ci,
        Map<String,Object> enhancemntProps) throws EngineException {
    [..]
}

The Map<String,Object> containing the EnhancementProperties is a read/write-able copy of the EnhancementProperties parsed with the ContentItem. That mean that EnhancementEngine implementations are free to change the contents of that map. Those changes will not affect the state of the ContentItem.

The keys of in the map are the string keys of the parsed Enhancement Properties (e.g. enhancer.max-suggestion or enhancer.engines.dereference.fields). Values can be any Object. Arrays and Collections may be used for multi value properties. The EnhancementEngineHelper utility provides methods to convert values to expected.

//define supported enhancement properties as constants
public static final String MAX_SUGGESTIONS = "enhancer.max-suggestions";
public static final String DEREFERENCED_FIELDS = "enhancer.engines.dereference.fields";

[..]

@Override
public final void computeEnhancements(ContentItem ci) throws EngineException {
    Map<String,Object> enhProp = EnhancementEngineHelper.getEnhancementProperties(this, ci);
    Integer maxSuggestions = EnhancementEngineHelper.getFirstConfigValue(this, ci,
        enhProp, MAX_SUGGESTIONS, Integer.class);

    Collection<String> fields = EnhancementEngineHelper.getConfigValues(this, ci, 
        enhProp, DEREFERENCED_FIELDS, String.class);
}

There are also parseConfig*(..) methods where one can directly parse the object value. Those methods do also not throw an EnhancementPropertyException. Note also the get*ConfigValue(Dictionary<String,Object>, ...) methods that can be used to parsed the OSGI component configuration.

Definition ofChain scoped Enhancement Properties

Chain scoped EnhancementProperties are represented by RDF in the ExecutionPlan. As in 0.12.1 and 1.* the ExecutionPlan is provided by the Chain#getExecutionPlan() method most currently used Chain implementations where extended to support the the configuration of chain scoped Enhancement Properties.

Starting from 0.12.1 the ListChain, WeightedChain and GraphChain allow the configuration of EnhancementProperties:

All EnhancementProperties configured with a Chain are written as RDF to the ExecutionPlan. Chain scoped properties are directly added to the ep:ExecutionPlan instance while chain and engine scoped properties are added to the ep:ExecutionNode of the according engine.

The following figure shows an example of Enhancement Properties configured for a WeightedChain.

WeightedChain including some Enhancement Properties

The figure shows that for the dbpedia-fst engine the maximum number of suggestions are set to 10. Also the minimum confidence value is set to 0.8. For the dbpedia-dereference engine the dereferenced languages are set to English, German and Spanish. Finally a chain scoped property is used to set the maximum number of suggestions for the whole chain to 5. However this has no effect for the dbpedia-fst engine as its custom configuration will override this chain wide property.

The following listing shows the exact same configuration in the .cfg format.

stanbol.enhancer.chain.name="dbpedia-linking"
stanbol.enhancer.chain.weighted.chain=["tika;optional","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-chunker",
    "dbpedia-fst;\ enhancer.max-suggestions\=10;\ enhancer.min-confidence\=0.8",
    "dbpedia-dereference;\ enhancer.engines.dereference.languages\=en,de,es"]
stanbol.enhancer.chain.chainproperties=["enhancer.max-suggestions\=5"]

NOTE: With version 2.* of the enhancer it will be possible to directly parse/refer an ExecutionPlan as RDF graph. This will also allow to manage/configure chain scoped enhancement properties in RDF.

Definition of Request scoped Enhancement Properties

Request and request and engine scoped EnhancementProperties are commonly called __Request Properties_. They can be parsed as Query Parameter with enhancement requests or directly set to the RequestProperties contentPart via the Java API.

Request Properties encoding

Request properties use the following encoding:

As example the request property enhancer.max-suggestions=5 would set the maximum number of suggestions for all engines to five. In contrast the request property dbpedia-fst:enhancer.max-suggestions=10 would set the maximum number of suggestions for the DBpedia FST linking engine to ten. If both request properties are parsed the DBpedia FST linking engine would be allowed to suggest ten entities while all the other would give five suggestions at max.

Parsing Request Properties via the Enhancer RESTful Service

Starting with 0.12.1 Enhancement Properties can be parsed as query parameter of Enhancement Requests. For request scoped properties the property name is used as parameter. Request and engine scoped properties need to use {engine-name}:{property-name} as parameter.

The following shows the curl request generating the equivalent of the example used in the above section:

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Eifeltower is located in Paris." 
    http://localhost:8080/enhancer?enhancer.max-suggestions=5&\
    dbpedia-linking:enhancer.min-confidence=0.33&\
    conf-filter:enhancer.min-confidence=0.85

Request Properties Java API

In version 0.12.1 and 1.* Request Properties (request and request and engine scoped EnhancementProperties) are stored in the ContentPart with the URI urn:apache.org:stanbol.enhancer:request.properties. The ContentItemHelper utility provides methods to retrieve and/or init this content part.

The RequestProperties content part uses a simple Map<Stirng,Object>. Keys do use the Request Properties encoding. Values can be of all types supported by enhancement properties.

The following code segment provides an example on how to set Request Properties via the Java API.

ContentItem ci; //the content item
Map<String,Object> reqProp = ContentItemHelper.initEnhancementPropertiesContentPart(ci)
//set min confidence to 0.5 for all engines
reqProp.put("enhancer.minConfidence","0.5");
//set max suggestions to 10 for the linking engine
reqProp.put("linking:enhancer.maxSuggestions","10");

Note with the enhancer 2.0 the request properties content part will get removed and replaced by the EnhancementJob API (TBD).

Enhancement Engine Support

Enhancement Properties MUST BE supported by Enhancement Engine implementations.

NOTE: that the properties used in the different examples are NOT supported in with the 0.12.1 release. The definition of global enhancement properties and its support for the most commonly used enhancement engines is paned to be added before the 1.0.0 release. The epic STANBOL-1343 tracks the progress. Please also note the documentation of specific engines for details about supported properties.

The only engine that does already support Enhancement Properties with the 0.12.1 release is the Entityhub Dereference Engine.