Enhancement Properties
since version 0.12.1
with STANBOL-488
Enhancement Properties allow to parametrize the enhancement process of a ContentItem. In contrast to the configuration of Enhancement Engines - that is bound to the component life cycle - enhancement properties can be defined for Enhancement Chain and single enhancement requests.
Naming and definition
EnhancementProperties are defined as string keys similar to Java properties. To represent them in RDF the key is transformed to an URI by using the string key as local name and http://stanbol.apache.org/ontology/enhancementproperties#
as namespace. The default namespace prefix for enhancement properties is ehp
.
The String key of Enhancement Properties MUST start with enhancer.
and SHOULD use the enhancer.{level-1}.{level-2}.{property-name}
syntax. Properties are case sensitive and SHOULD only use lower case characters. The '-' char shall be used to make properties with multiple names easier to read.
Globally defined properties use 'enhancer.{property-name}
'. Enhancement Engine specific properties a possible shorted/simplified name of the engine should be used as {level-1}
. Engine specific properties might also use engines
as {level-1}
and the name of the engine as {level-2}
.
Examples: enhancer.max-suggestions
or enhancer.min-confidence
are typical examples for globally defined Enhancement Properties. Properties defined by specific Enhancement Engines will look like enhancer.entity-co-mention.adjust-existing-confidence
or enhancer.engines.dereference.fields
(as defined by STANBOL-1287).
Enhancement Properties can also be defined as RDF datatype properties. This allows to specify the expected XSD data type of expected values.
@prefix ehp <http://stanbol.apache.org/ontology/enhancementproperties#> . ehp:enhancer.max-suggestions a rdfs:DatatypeProperty ; xsd:datatype xsd:Integer . ehp:enhancer.min-confidence a rdfs:DatatypeProperty ; xsd:datatype xsd:Double . ehp:enhancer.entity-co-mention.adjust-existing-confidence a rdfs:DatatypeProperty ; xsd:datatype xsd:Double . ehp:enhancer.engines.dereference.fields a rdfs:DatatypeProperty ; xsd:datatype xsd:String .
NOTE that the Java Interface will parse enhancement properties as Map<String,Object>
. Regardless of the defined data type Enhancement Engines that support a property MUST support to parse values from string values (the lexical form of the RDF literal). Multiple values may be parsed as Java Collection or an Array.
Scopes
Enhancement Properties can be defined with the following scopes
- request and engine: Properties with this scope are applied for a single request and a specific Enhancement Engine part of the executed Enhancement Chain. They do have the highest priority and will therefore override properties defined with any of the below scopes.
- request: Properties valid for a single request that are parsed to every Enhancement Engine part of the executed Enhancement Chain.
- chain and engine: Properties defined for a specific Enhancement Engine of an Enhancement Chain. As all chain scoped properties, those get applied to all executions of that chain.
- chain: Chain specific properties parsed to all Enhancement Engines of the Enhancement Chain. Enhancement Properties of this scope do have the lowest priority and will be overridden by any property with the same key and one of the above scopes.
Properties with a higher priority will override properties with an lower priority. Meaning if a property enhancer.min-confidence=0.5
is defined on a chain scope it can be overridden by enhancer.min-confidence=0.75
on a chain and engine scope. A single request might still override the value on a request or request and engine scope.
Chain and/or chain and engine scoped properties are configured with Enhancement Chain definition. Request and/or request and engine scoped properties can be specified as query parameter of the POST request or via the Java API by accessing the Request Properties content part. See the following sections for detailed information.
Using Enhancement Properties
Enhancement Properties are consumed by Enhancement Engines. This section describes how implementors of engines can retrieve Enhancement Properties from the request - calls to the computeEnhancements(..)
method.
In version 0.12.1
and 1.*
EnhancementProperties are contained in the ContentItem parsed to the EnhancementEngine. The EnhancementEngineHeloer
utility has methods to access them. The following listing shows the code necessary to get the Enhancement Properties from the parsed ContentItem.
@Override public final void computeEnhancements(ContentItem ci) throws EngineException { Map<String,Object> enhancemntProps = EnhancementEngineHelper.getEnhancementProperties(this, ci); [..] }
With 2.0.0
the EnhancementEngine API will be changed so that the EnhancementProperties are parsed as an additional parameter.
@Override public final void computeEnhancements(ContentItem ci, Map<String,Object> enhancemntProps) throws EngineException { [..] }
The Map<String,Object>
containing the EnhancementProperties is a read/write-able copy of the EnhancementProperties parsed with the ContentItem. That mean that EnhancementEngine implementations are free to change the contents of that map. Those changes will not affect the state of the ContentItem.
The keys of in the map are the string keys of the parsed Enhancement Properties (e.g. enhancer.max-suggestion
or enhancer.engines.dereference.fields
). Values can be any Object. Arrays and Collections may be used for multi value properties. The EnhancementEngineHelper
utility provides methods to convert values to expected.
//define supported enhancement properties as constants public static final String MAX_SUGGESTIONS = "enhancer.max-suggestions"; public static final String DEREFERENCED_FIELDS = "enhancer.engines.dereference.fields"; [..] @Override public final void computeEnhancements(ContentItem ci) throws EngineException { Map<String,Object> enhProp = EnhancementEngineHelper.getEnhancementProperties(this, ci); Integer maxSuggestions = EnhancementEngineHelper.getFirstConfigValue(this, ci, enhProp, MAX_SUGGESTIONS, Integer.class); Collection<String> fields = EnhancementEngineHelper.getConfigValues(this, ci, enhProp, DEREFERENCED_FIELDS, String.class); }
There are also parseConfig*(..)
methods where one can directly parse the object value. Those methods do also not throw an EnhancementPropertyException
. Note also the get*ConfigValue(Dictionary<String,Object>, ...)
methods that can be used to parsed the OSGI component configuration.
Definition ofChain scoped Enhancement Properties
Chain scoped EnhancementProperties are represented by RDF in the ExecutionPlan. As in 0.12.1
and 1.*
the ExecutionPlan is provided by the Chain#getExecutionPlan()
method most currently used Chain implementations where extended to support the the configuration of chain scoped Enhancement Properties.
Starting from 0.12.1
the ListChain, WeightedChain and GraphChain allow the configuration of EnhancementProperties:
-
chain and engine scoped properties are defined as parameters to the engines with the syntax
{engine-name}; {property-name-1}={value-1},{value-2}; {property-name-2}={value-1};
-
chain scoped properties can be configured by using the osgi property key
stanbol.enhancer.chain.chainproperties
by the syntax{property-name-1}={value-1},{value-2}
. NOTE that;
is NOT supported as separator for parsing multiple properties as OSGI configurations already define a way for parsing multiple values
All EnhancementProperties configured with a Chain are written as RDF to the
ExecutionPlan. Chain scoped properties are directly added to the
ep:ExecutionPlan
instance while chain and engine scoped properties are added to the
ep:ExecutionNode
of the according engine.
The following figure shows an example of Enhancement Properties configured for a WeightedChain.
The figure shows that for the dbpedia-fst
engine the maximum number of suggestions are set to 10
. Also the minimum confidence value is set to 0.8
. For the dbpedia-dereference
engine the dereferenced languages are set to English, German and Spanish. Finally a chain scoped property is used to set the maximum number of suggestions for the whole chain to 5
. However this has no effect for the dbpedia-fst
engine as its custom configuration will override this chain wide property.
The following listing shows the exact same configuration in the .cfg
format.
stanbol.enhancer.chain.name="dbpedia-linking" stanbol.enhancer.chain.weighted.chain=["tika;optional","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-chunker", "dbpedia-fst;\ enhancer.max-suggestions\=10;\ enhancer.min-confidence\=0.8", "dbpedia-dereference;\ enhancer.engines.dereference.languages\=en,de,es"] stanbol.enhancer.chain.chainproperties=["enhancer.max-suggestions\=5"]
NOTE: With version 2.*
of the enhancer it will be possible to directly parse/refer an ExecutionPlan as RDF graph. This will also allow to manage/configure chain scoped enhancement properties in RDF.
Definition of Request scoped Enhancement Properties
Request and request and engine scoped EnhancementProperties are commonly called __Request Properties_. They can be parsed as Query Parameter with enhancement requests or directly set to the RequestProperties contentPart via the Java API.
Request Properties encoding
Request properties use the following encoding:
- request scoped enhancement properties are directly parsed by their key
- request and engine scoped enhancement properties are parsed by using
{engine-name}:{property-name}
As example the request property enhancer.max-suggestions=5
would set the maximum number
of suggestions for all engines to five. In contrast the request property dbpedia-fst:enhancer.max-suggestions=10
would set the maximum number of suggestions for the DBpedia FST linking engine to ten. If both request properties are parsed the DBpedia FST linking engine would be allowed to suggest ten entities while all the other would give five suggestions at max.
Parsing Request Properties via the Enhancer RESTful Service
Starting with 0.12.1
Enhancement Properties can be parsed as query parameter of Enhancement Requests. For request scoped properties the property name is used as parameter. Request and engine scoped properties need to use {engine-name}:{property-name}
as parameter.
The following shows the curl request generating the equivalent of the example used in the above section:
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Eifeltower is located in Paris." http://localhost:8080/enhancer?enhancer.max-suggestions=5&\ dbpedia-linking:enhancer.min-confidence=0.33&\ conf-filter:enhancer.min-confidence=0.85
Request Properties Java API
In version 0.12.1
and 1.*
Request Properties (request and request and engine scoped EnhancementProperties) are stored in the ContentPart with the URI urn:apache.org:stanbol.enhancer:request.properties
. The ContentItemHelper utility provides methods to retrieve and/or init this content part.
The RequestProperties content part uses a simple Map<Stirng,Object>
. Keys do use the
Request Properties encoding. Values can be of all types supported by enhancement properties.
The following code segment provides an example on how to set Request Properties via the Java API.
ContentItem ci; //the content item Map<String,Object> reqProp = ContentItemHelper.initEnhancementPropertiesContentPart(ci) //set min confidence to 0.5 for all engines reqProp.put("enhancer.minConfidence","0.5"); //set max suggestions to 10 for the linking engine reqProp.put("linking:enhancer.maxSuggestions","10");
Note with the enhancer 2.0
the request properties content part will get removed and replaced by the EnhancementJob API (TBD).
Enhancement Engine Support
Enhancement Properties MUST BE supported by Enhancement Engine implementations.
NOTE: that the properties used in the different examples are NOT supported in with the 0.12.1
release. The definition of global enhancement properties and its support for the most commonly used enhancement engines is paned to be added before the 1.0.0
release. The epic STANBOL-1343 tracks the progress. Please also note the documentation of specific engines for details about supported properties.
The only engine that does already support Enhancement Properties with the 0.12.1
release is the Entityhub Dereference Engine.