This project has retired. For details please refer to its Attic page.
Apache Stanbol - ManagedSite

ManagedSite

A ManagedSite allow users to manage a collection of Entities by using the RESTful API of the Entityhub. Other than the ReferencedSite implementation it does not allow to refer to remote services. Therefor all changes to Entities managed by a ManagedSite are preformed via the RESTful API of the Entityhub.

Users can configure multiple ManagedSites with the Stanbol Entitiyhub. They are identified by their id and share the id-space with other Sites (e.g. other ReferencedSite). The RESTful services of a ManagedSite are available via the URL pattern

http://{stanbol-instance}/entityhub/site/{siteId}

NOTE: To make this documentation less abstract it will use a scenario that assumes that someone wants to managing the IPTC Descriptive NewsCodes by using a ManagedSite. Typical Stanbol users will want to manage their own Entities (e.g. Tags/Categories of their CMS) instead.

Manage Entities by using RESTful services

The RESTful API of Managed Sites is the same as of other Sites only the "/entity" Endpoint does also support to create, update and delete Entities.

The following Example shows how to upload a SKOS vocabulary to a ManagedSite:

curl -i -X PUT -H "Content-Type: application/rdf+xml" -T subject-code.rdf \
    "http://localhost:8080/site/iptc/entity"

This example assumes that Stanbol is running on 'localhost' port '8080' and that a ManagedSite with the id 'iptc' was configured. The uploaded file 'subject-code.rdf' contains the IPTC subject-codes. To upload also the vocabulary containing the genres one needs to call

curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf \
    "http://localhost:8080/site/iptc/entity"

Calls like that will create/update all Entities contained in the parsed RDF data. If one wants to ensure that only a single Entity is created/updated one can specify the 'id' parameter.

curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf \
    "http://localhost:8080/site/iptc/entity?id=http://cv.iptc.org/newscodes/genre/Exclusive"

This will ignore all other RDF data but only update the 'genre:Exclusive' entity.

For the full documentation of the CRUD interface of the '/entity' endpoint of a ManagedSite please have a look at the RESTful API documentation served by the Web UI of the Stanbol Entityhub.

Configuration of ManagedSites

Currently their is a single implementation of the ManagesSite interface that uses a Yard instance for managing the entities.

For using a YardSite users need to configure two Services:

  1. Yard: The Entityhub currently includes three different Yard implementations. The SolrYard, ClerezzaYard and SesameYard. The SolrYard is optimal for the use with the Stanbol Enhancer as it allows very fast label based retrieval of Entities. So if you plan to use the ManagedSite primarily with the Stanbol Enhancer this is definitely the Yard implementation to choose. The ClerezzaYard and the SesameYard store the managed Entities within a TripleStore. Both are not very efficient for label based lookups as required by the Entity Linking engines of the Stanbol Enhancer. But they are well suited for more data focused use cases as well as for the use with the Entity Dereference Engines.
  2. YardSite: This configures the ManagedSite. This configuration links to the configured Yard via its id.

Configuration of a SolrYard:

This describes how to configure an SolrYard to be used with an YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.

Typical SolrYard configuration for a YardSite

The above figure shows a typical SolrYard configuration for a YardSite. Important properties are

Configuration of a ClerezzaYard:

This describes how to configure an ClerezzaYard to be used with an YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.

Typical ClerezzaYard configuration for a YardSite

The above figure shows a typical ClerezzaYard configuration for a YardSite. Important properties are

The ClerezzaYard also registers the its RDF graph with the Apache Stanbol SPARQL service available at http://{stanbol-instance}/sparql

To query the RDF graph of a ClerezzaYard you need to specify the its configured Graph URI in SPARQL queries posted to the Stanbol SPARQL endpoint

curl -i -X POST -d "graphuri=http://cv.iptc.org/newscodes" \
    --data-urlencode "query@sparqlQuery.txt" \
    "http://localhost:8080/sparql"

where 'sparqlQuery.txt' refers to a file containing the SPARQL query e.g.

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?concept ?prefLabel ?altLabel ?parent
WHERE {
    ?concept a skos:Concept .
    ?concept skos:prefLabel ?prefLabel .
    OPTIONAL {
        ?concept skos:altLabel ?altLabel .
    }
}

Configuration of a Sesame Yard Site

With STANBOL-1169 (since version 0.12.1) a Sesame Repository registered as OSGI service can be used as Entityhub Yard.

The following figure shows a Apache Marmotta Kiwi Repository registered as OSGI service.

Marmotta Kiwi Repository Service

The highlighted org.openrdf.repository.Repository.id key is used to link a specific Sesame Repository to a Sesame Yard Site. All the other keys are implementation specific and not used by the Entityhub Sesame Yard Site.

When configuring a SesameYard one need to set the Repository (org.openrdf.repository.Repository.id key) to the value of the Sesame Repository one would like to use as backend. This is especially important if multiple Sesame Repositories are registered as OSGI services.

The following figure shows the configuration dialog for a Sesame Yard. Again the id of the Sesame Repository is highlighted.

Marmotta Kiwi Repository Service

The Context URIs (org.apache.stanbol.entityhub.yard.sesame.contextUri key) can be used to configure specific Named Graphs used to read/write RDF triples to/from. An empty value is interpreted as the null context. For using the union graph one needs to deactivate the Enable Contexts (org.apache.stanbol.entityhub.yard.sesame.enableContext key) option. In this case all configured Context URIs will get ignored.

Configuration of the YardSite

Finally you need to configure the YardSite that uses the previously configured Yard instance (either SolrYard or ClerezzaYard). Again this will show how to configure the YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.

Typical YardSite configuration

The above figure shows the configuration of the YardSite. The important properties are

The Entity Prefix(es) are an optional configuration. This is used by the SiteManager (the "/entityhub/sites" endpoint) if requested entities can be dereferenced via a registered site. If not present the SiteManager will try to dereference every request by using this ManagedSite. So correctly configuring this may slightly improve performance by avoiding unnecessary requests.

The Field Mappings can be used to copy property values of created/updates Entities to other properties. The mappings used in the above figure ensure that SKOS preferred/alternate labels, FOAF (Friend of a Friend) names, Dublin Core titles as well as the name property of the schema.org ontology are copied over to rdfs:label. This configuration is the default as the Stanbol Enhancer uses rdfs:label as default property for linking entities based on their names.

After completing all those steps you should see a new empty ManagedSite under

http://{stanbol-instance}/entityhub/site/iptc