Entityhub Dereference Engine
This is an Entity Dereference Engine for the Stanbol Entityhub. It supports dereferencing Entities from
- Entityhub: locally managed Entities (the
/entityhub
endpoint) - Managed and Referenced Sites (the
/entityhub/site/{site-name}
endpoints) - SiteManager: Union view over all Managed and Referenced Sites (the
/entityhub/sites
endpoint)
Configuration
The following figure shows the configuration dialog of the Entityhub Dereference Engine:
The following Configuration parameter are defined by the core Entity Dereference Engine. Actual Dereference Engine implementations might not support all of them.
- Name (stanbol.enhancer.engine.name): The name of the Enhancement engine
- Site (enhancer.engines.dereference.entityhub.siteId): The name of the Entityhub Site to be used for dereferencing.
*
will dereference against the SiteManager (union over all Referenced and Managed sites) andentityhub
will use the entityhub itself for dereferencing. - Fallback Mode (enhancer.engines.dereference.fallback): The fallback mode will only schedule Entities for dereferencing if no data for them are yet present in the Enhancement results (see the documentation of the Entity Dereference Engine for more information and possible usage scenarios).
- URI Prefix (enhancer.engines.dereference.uriPrefix): Allows to configure [0..*] prefixes of Entity URIs that can be dereferenced by this engine. If present only Entities that match one of those prefixes are scheduled to be dereferenced by the
EntityDereferencer
. - URI Pattern (enhancer.engines.dereference.uriPatter): Allows to configure a regex pattern for matching Entity URIs. If present only Entities matching at lease one of the configured patterns will be scheduled for dereferencing.
- Dereference only Content Language Literals (enhancer.engine.dereference.filterContentlanguages): If enabled only Literals with the same language as the language detected for the Content will get dereferenced. Literals with no language tag will always get dereferenced.
- Dereferenced Fields (enhancer.engines.dereference.fields): The dereferenced fields - in RDF terminology 'properties' - to be dereferenced. QNames (e.g.
rdf:label
) can be used for the configuration. This Engine supports the use of FieldMappings for the configuration (see the according sub-section for details). - Dereference LD Path (enhancer.engines.dereference.ldpath): The LD Path Language allows to define powerful selectors for dereferenced Entities.
- Use Shared Thread Pool (enhancer.engines.dereference.entityhub.threads.shared): If enabled multiple configured Entityhub Dereference Engines will use a shared Thread Pool. The shared Thread pool is provided by an own Component that can be configured independently (see next sub-section). In most cases it is better to enable this feature and to add additional threads to the shared pool if necessary.
- Dereference Threads (enhancer.engines.dereference.entityhub.threads.size): If no shared Thread pool is used this allows to configure the size of the thread pool just used by this engine. For values < 1 no Thread Pool will be created and the calling thread will get used to dereference entities.
Additional Supported Properties that are not included in the configuration form:
- Dereference Properties (enhancer.engines.dereference.references): The list of properties that reference Entities. By default
fise:entity-reference
is used. A Triple pattern(null,{entity-reference},null)
is used for all configured property URIs. All unique objects of type URI are considered as entities to be dereferenced. NOTE that configured URI Prefix and/or an URI Pattern are also applied to the list of entity uris. - Dereference Languages (enhancer.engines.dereference.languages): A set of languages that are dereferenced. Even if 'Dereference only Content Language Literals' is active explicitly configured languages will still get dereferenced. If not present and 'Dereference only Content Language Literals' is deactivated literals of any language will get dereferenced.
- Service Ranking (service.ranking): The OSGI service ranking. Will only have an effect if their are two engines with the same name. In such cases the one with the higher service ranking will get called.
Shared Thread Pool Configuration
The Shared Thread Pool is a singelton Component used by all Entityhub Dereference Engines with the 'Use Shared Thread Pool' option enabled. It has only a single configuration option (enhancer.engines.dereference.entityhub.sharedthreadpool.size) that allows to set the size of the thread pool.
Advanced Dereference Configurations
Entityhub Field Mapping Support
The enhancer.engines.dereference.fields configuration does support the Entityhub Field Mapping language.
FieldMappings do use the following syntax:
[!]FieldPattern [| Filter] [> Mapping]
- an optional Exclusion indicated by '!' as the first character of the mapping used to exclude fields that are matched by the
FieldPattern
part (e.g.!foaf:*
will exclude all properties of the FOAF namespace). Exclusions are only useful if a wildcard is used (e.g.foaf:*
together with!foaf:mbox
). - the required FieldPattern supports the definition of prefixes such as
http://xmlns.com/foaf/0.1/*
orfoaf:*
- the optional Filter part allows to filter specific languages (e.g.
@=null;en;de;
will only dereference English and German literals as well as literals with no language tag), typed literals (e.g.d=xsd:dateTime;xsd:date
) or URI values (e.g.d=entityhub:ref
). Filters will also try to convert values to the parsed data type (e.g.d=xsd:double
would convertxsd:float
values toxsd:doule
. Also string literals that can be parsed as double would be converted). - an optional Mapping can be used to copy values to an other field (e.g.
foaf:name > schema:name
would copy all FOAF names to the schema.org name field)
NOTE that Field Mappings configured for the EntityhubDerefereceEngine are overridden by Field Mappings parsed as Enhancement Properties.
LDPath support
The use ofLD Path Language is an alternative to most of the features supported by the Entityhub Field Mapping language. Especially Filters and Mapping SHOULD BE expressed using LD Path.
The only advantage of the Field Mapping language is that is supports the use of wildcards and exclusions. So in cases where one once to dereference all properties of a specific namespace it is only possible to specify this by using the Field Mapping language.
The following Example shows a configuration that dereferences all schema.org properties and also uses LD Path to align soem none schema.org properties
enhancer.engines.dereference.fields="schema:*" enhancer.engines.dereference.ldpath=["@prefix schema <http://schema.org/>;", "@prefix dct <http://purl.org/dc/terms/>;", "schema:name = (rdfs:label | dct:title | dc:title | foaf:name | skos:prefLabel);", "schema:alternateName = skos:altLabel;" "schema:image = foaf:depiction;", "schema:homepage = foaf:homepage;"]
NOTE when used in a OSGI *.cfg
file one would need to escape spaces and =
with \
and remove all line breaks.
Supported Enhancement Properties
since version 0.12.1
with STANBOL-1287
The following Enhancement Properties are supported by the Entityhub Dereference Engine
- Dereference Properties (enhancer.engines.dereference.references): a collection of properties that reference Entities. Parsed values will me merged (union) to those statically configured for the Engine.
- Dereference Languages (enhancer.engines.dereference.languages): A set of languages that are dereferenced. Even if 'Dereference only Content Language Literals' is active explicitly configured languages will still get dereferenced. * Dereferenced Fields (enhancer.engines.dereference.fields): The dereferenced fields - in RDF terminology 'properties' - to be dereferenced. QNames (e.g.
rdf:label
) can be used for the configuration. This Engine supports the use of FieldMappings for the configuration. Dereferenced Fields parsed as EnhancementProperty will override values configured for the Engine. - Dereference LD Path (enhancer.engines.dereference.ldpath): The LD Path Language allows to define powerful selectors for dereferenced Entities. An LD Path program parsed as EnhancementProperty will be executed in addition to those configured for the engine.
As an example the following query parameter would instruct all Entityhub Dereference engines used in an enhancement engine to just dereference English and German literals.
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Eifeltower is located in Paris." http://localhost:8080/enhancer?enhancer.engines.dereference.languages=en&\ enhancer.engines.dereference.languages=de