This project has retired. For details please refer to its Attic page.
Apache Stanbol - Entity Dereference Engine

Entity Dereference Engine

since version 0.12.0 with STANBOL-1222

The responsibility of the Dereference Engine is to retrieve information about Entities referenced by the Enhancement Results and add them to the metadata of the Content Item.

Consumed information

The Entity Dereference Engine consumes the RDF enhancements generated by other Enhancement Engines. Especially the fise:entity-reference properties used by fise:EntityAnnotation and fise:TopicAnnotation are processed by this engine as they do link to the Entities that need to be dereferenced.

Design

The Entity Dereference Engine can not directly be used to dereference Entities. It provides the base functionality for the implementation of dereference Engines for different technologies and services. One such implementation is the Entityhub Dereference Engine for dereferencing Entities via the Stanbol Entityhub).

The module providing this infrastructure is

<dependency>
    <groupId>org.apache.stanbol</groupId>
    <artifactId>org.apache.stanbol.enhancer.dereference.core</artifactId>
    <version>${stanbol-version}</version>
</depednecy>

This module provides the following main components:

  1. EnhancementEngine implementation that
    • processes the Enhancement results and schedules Entities to be dereferenced.
    • supports the use of a thread pool to dereference multiple entities concurrently.
    • supports EnhancementProperties for chain and request scoped configuration of the dereferenced information.
  2. Definition of the EntityDerefernecer interface used to dereference scheduled entities. This interface needs to be implemented by Dereference Engines for different technologies/services (e.g. the Entityhub)

In addition the module also provides utilities for managing the enhancement engine configuration as well as parsed Enhancement Properties.

Configuration

The following Configuration parameter are defined by the core Entity Dereference Engine. Actual Dereference Engine implementations might not support all of them.

NOTE that the configurations for Dereference Languages, Dereferenced Fields and Dereference LD Path are just managed by the Core Entity Dereference Engine implementation. Actual support for such properties will depend on the actual EntityDereferencer implementation.

Building a Custom Entity Dereference Engine

This provides information about the necessary steps for building a custom Entity Dereference Engine.

Entity Dereferencer implementation

The EntityDereferencer interface is used to dereference Entities. It also allows the EntityDereferenceEngine to check if OfflineMode is supported and to retrieve the ExecutorService service.

The following listing shows the signature of the EntityDereferencer interface

EntityDereferencer
    + supportsOfflineMode() : boolean
    + getExecutor() : ExecutorService
    + boolean dereference(UriRef entity, MGraph graph, Lock writeLock, 
        DereferenceContext dereferenceContext) throws DereferenceException;

supportsOfflineMode need to return true if the implementation does not need to access a remote service for dereferencing entities and false if it requires remote services. If Apache Stanbol is started with Offline Mode enabled EntityDereferencer implementation that do not support Offline Mode will not be called - meaning that no Entities will get dereferenced from services that do require an internet connection.

The ExecutorService is used by the EntityDereferenceEngine to concurrently dereference entities. This means that the dereference(..) method of the EntityDereferencer implementations will be called in the contexts of threads provided by the returned ExecutorService. Returning null will deactivate this feature.

NOTE that all EntityDereferencer MUST BE thread save as multiple threads will be used to call the dereference(..) method. Even if getExecutor() returns null the EnhancementJobManager will still use multiple threads for calling the EntityDereferenceEngine - meaning that dereference(..) will be called with different thread contexts.

The dereference(..) method is used to dereference the Entity with the parsed UriRef. Dereferenced information are expected to be written in the parsed MGraph. While writing dereferenced information to the parsed graph a write lock MUST BE acquired. The DereferenceContext provides the configuration (see the following section for more information). If the parsed entity was successfully dereferenced this method is expected to return true. Otherwise false.

Configuration API

Configuration Parameters supported by the Core Entity Dereference Engine implementation are defined in the DereferenceConstants class.

DereferenceEngineConfig

The DereferenceEngineConfig class provides an easy - API based - access to those configuration parameters. It is instantiated by using the Dictionary parsed by the OSGI as part of the ComponentContext.

DereferenceContext

The DereferenceContext is used to parse request specific context to the EntityDereferencer implementation.

For that it is important to note that a single request to the Entity Dereference Engine can schedule multiple Entities to be dereferenced and therefore result in multiple call to the EntityDereferencer#dereference(..) method. All such calls will use the same DereferenceContext instance.

Extending the DereferenceContextFactory allows dereference engine implementations to use a custom DereferenceContext. With that it is possible to parse request specific configuration (e.g. parsed by Enhancement Properties only once per request. The following code snippet shows how to use a custom DereferenceContext with the core EntityDereferenceEngine implementation.

entityDereferenceEngine = new EntityDereferenceEngine(entityDereferencer, engineConfig,
        new DereferenceContextFactory() { //we want to use our own DereferenceContext impl

            @Override
            public DereferenceContext createContext(EntityDereferenceEngine engine,
                    Map<String,Object> enhancementProperties) throws DereferenceConfigurationException {
                //Instantiate custom DereferenceContext
                DereferenceContext dereferenceContext = null; //TODO
                return dereferenceContext;
            }
        });

For the initialization of the custom DereferenceContext one need to use the initialise callback

public class MyDereferenceContext extends DereferenceContext {

    protected MyDereferenceContext(MyDereferenceEngine engine, 
        Map<String,Object> enhancementProps) throws DereferenceConfigurationException {
        super(engine, enhancementProps);
    }

    @Override
    protected void initialise() throws DereferenceConfigurationException {
        //do your custom initialisation here
    }

}

If you apply this code all calls to EntityDereferencer#dereference(..) will parse an instance of the custom DereferenceContext implementation.

The custom DereferenceContext implementation of the Entityhub Dereference Engine is a good example to start from.

OSGI Component

Finally each Dereference Engine implementation needs to provide an OSGI component. This component is required for parsing the configuration and for implementing the life cycle.

The following listing provide the pseudo code for such a component

@Component(
    configurationFactory = true, //allow multiple instances
    policy = ConfigurationPolicy.REQUIRE, //a configuration is required
    metatype = true, immediate = true)
@Properties(value={
    @Property(name=PROPERTY_NAME), //the name of the engine
    //Properties supported by the Core Entity Dereference Engine
    @Property(name=EntityhubDereferenceEngine.SITE_ID),
    @Property(name=DereferenceConstants.FALLBACK_MODE, 
        boolValue=DereferenceConstants.DEFAULT_FALLBACK_MODE),
    @Property(name=DereferenceConstants.URI_PREFIX, cardinality=Integer.MAX_VALUE),
    @Property(name=DereferenceConstants.URI_PATTERN, cardinality=Integer.MAX_VALUE),
    @Property(name=DereferenceConstants.FILTER_CONTENT_LANGUAGES, 
        boolValue=DereferenceConstants.DEFAULT_FILTER_CONTENT_LANGUAGES),
    @Property(name=DEREFERENCE_ENTITIES_FIELDS,cardinality=Integer.MAX_VALUE,
        value={"rdfs:comment","geo:lat","geo:long","foaf:depiction","dbp-ont:thumbnail"}),
    @Property(name=DEREFERENCE_ENTITIES_LDPATH, cardinality=Integer.MAX_VALUE),
    /* add also implementation specific properties */
    @Property(name=SERVICE_RANKING,intValue=0)
})
public class YourDereferneceEngineComponent {

    /** support QName configurations */
    @Reference(cardinality=ReferenceCardinality.OPTIONAL_UNARY)
    protected NamespacePrefixService prefixService;

    /** The engine instance registered as OSGI service */
    protected EntityDereferenceEngine entityDereferenceEngine;
    /** The OSGI service registration */
    protected ServiceRegistration engineRegistration;

    @Activate
    protected void activate(ComponentContext ctx) throws ConfigurationException {
        Dictionary<String,Object> properties = ctx.getProperties();
        DereferenceEngineConfig engineConfig = new DereferenceEngineConfig(properties, prefixService);

        /* TODO: parse custom configuration properties */

        /* Initialise the custom EntityDereferencer implemenation */
        EntiyDereferencer dereferencer; //TODO

        //create the Entity Dereference Engine instance
        entityDereferenceEngine = new EntityDereferenceEngine(entityDereferencer, engineConfig);

        //register the engine as OSGI service
        engineRegistration = ctx.getBundleContext().registerService(
                new String[]{EnhancementEngine.class.getName(),
                             ServiceProperties.class.getName()},
                entityDereferenceEngine, engineConfig.getDict());
    }

    @Deactivate
    protected void deactivate(ComponentContext context) {
        //Unregister the OSGI service
        if(engineRegistration != null){
            engineRegistration.unregister();
            engineRegistration = null;
        }
        entityDereferenceEngine = null;

        //TODO: close the dereferencer implementation (if required)
    }
}