ManagedSite
A ManagedSite allow users to manage a collection of Entities by using the RESTful API of the Entityhub. Other than the ReferencedSite implementation it does not allow to refer to remote services. Therefor all changes to Entities managed by a ManagedSite are preformed via the RESTful API of the Entityhub.
Users can configure multiple ManagedSites with the Stanbol Entitiyhub. They are identified by their id and share the id-space with other Sites (e.g. other ReferencedSite). The RESTful services of a ManagedSite are available via the URL pattern
http://{stanbol-instance}/entityhub/site/{siteId}
NOTE: To make this documentation less abstract it will use a scenario that assumes that someone wants to managing the IPTC Descriptive NewsCodes by using a ManagedSite. Typical Stanbol users will want to manage their own Entities (e.g. Tags/Categories of their CMS) instead.
Manage Entities by using RESTful services
The RESTful API of Managed Sites is the same as of other Sites only the "/entity" Endpoint does also support to create, update and delete Entities.
The following Example shows how to upload a SKOS vocabulary to a ManagedSite:
curl -i -X PUT -H "Content-Type: application/rdf+xml" -T subject-code.rdf \ "http://localhost:8080/site/iptc/entity"
This example assumes that Stanbol is running on 'localhost' port '8080' and that a ManagedSite with the id 'iptc' was configured. The uploaded file 'subject-code.rdf' contains the IPTC subject-codes. To upload also the vocabulary containing the genres one needs to call
curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf \ "http://localhost:8080/site/iptc/entity"
Calls like that will create/update all Entities contained in the parsed RDF data. If one wants to ensure that only a single Entity is created/updated one can specify the 'id' parameter.
curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf \ "http://localhost:8080/site/iptc/entity?id=http://cv.iptc.org/newscodes/genre/Exclusive"
This will ignore all other RDF data but only update the 'genre:Exclusive' entity.
For the full documentation of the CRUD interface of the '/entity' endpoint of a ManagedSite please have a look at the RESTful API documentation served by the Web UI of the Stanbol Entityhub.
Configuration of ManagedSites
Currently their is a single implementation of the ManagesSite interface that uses a Yard
instance for managing the entities.
For using a YardSite users need to configure two Services:
- Yard: The Entityhub currently includes three different Yard implementations. The SolrYard, ClerezzaYard and SesameYard. The SolrYard is optimal for the use with the Stanbol Enhancer as it allows very fast label based retrieval of Entities. So if you plan to use the ManagedSite primarily with the Stanbol Enhancer this is definitely the Yard implementation to choose. The ClerezzaYard and the SesameYard store the managed Entities within a TripleStore. Both are not very efficient for label based lookups as required by the Entity Linking engines of the Stanbol Enhancer. But they are well suited for more data focused use cases as well as for the use with the Entity Dereference Engines.
- YardSite: This configures the ManagedSite. This configuration links to the configured Yard via its id.
Configuration of a SolrYard:
This describes how to configure an SolrYard to be used with an YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.
The above figure shows a typical SolrYard configuration for a YardSite. Important properties are
- ID: This MUST BE unique to all other Yards. It is recommended to use "{siteId}Yard".
- Solr Index/Core: This is the name of the SolrCore that will be used to store the data. Here it is recommended to use the same name as the {siteId}. This is because the RESTful API of the SolrCore is published under
http://{stanbol-instance}/solr/default/{solrCore}
. So using the same name as {siteId} and {solrCore} makes it easier for map the RESTful API of the SolrCore with the ManagedSite published underhttp://{stanbol-instance}/entityhub/stite/{siteId}
. - Use default SolrCore configuration: If enabled the SolrCore will be automatically created by using the default configuration. Users will typically want to use this option. Only users that want to use a special SolrCore configuration will need to deactivate this option and to provide a
{solrCore}.solrindex.zip
archive containing the special configuration in the{stanbol-workingdir}/stanbol/datafiles
directory. See theManaging Solr Indexes section for detailed information.
Configuration of a ClerezzaYard:
This describes how to configure an ClerezzaYard to be used with an YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.
The above figure shows a typical ClerezzaYard configuration for a YardSite. Important properties are
- ID: This MUST BE unique to all other Yards. It is recommended to use "{siteId}Yard".
- Graph URI: This allows to configure the URI of the named graph used to store the RDF data. If a graph with this URL is already present than it will be reused by this Yard. Otherwise an empty graph with this URI is created using the Clerezza TcManager. If this field is empty an URN will be used as default groph URI.
The ClerezzaYard also registers the its RDF graph with the Apache Stanbol SPARQL service available at http://{stanbol-instance}/sparql
To query the RDF graph of a ClerezzaYard you need to specify the its configured Graph URI in SPARQL queries posted to the Stanbol SPARQL endpoint
curl -i -X POST -d "graphuri=http://cv.iptc.org/newscodes" \ --data-urlencode "query@sparqlQuery.txt" \ "http://localhost:8080/sparql"
where 'sparqlQuery.txt' refers to a file containing the SPARQL query e.g.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT distinct ?concept ?prefLabel ?altLabel ?parent WHERE { ?concept a skos:Concept . ?concept skos:prefLabel ?prefLabel . OPTIONAL { ?concept skos:altLabel ?altLabel . } }
Configuration of a Sesame Yard Site
With STANBOL-1169 (since version 0.12.1
) a Sesame Repository registered as OSGI service can be used as Entityhub Yard.
The following figure shows a Apache Marmotta Kiwi Repository registered as OSGI service.
The highlighted org.openrdf.repository.Repository.id
key is used to link a specific Sesame Repository to a Sesame Yard Site. All the other keys are implementation specific and not used by the Entityhub Sesame Yard Site.
When configuring a SesameYard one need to set the Repository (org.openrdf.repository.Repository.id
key) to the value of the Sesame Repository one would like to use as backend. This is especially important if multiple Sesame Repositories are registered as OSGI services.
The following figure shows the configuration dialog for a Sesame Yard. Again the id of the Sesame Repository is highlighted.
The Context URIs (org.apache.stanbol.entityhub.yard.sesame.contextUri
key) can be used to configure specific Named Graphs used to read/write RDF triples to/from. An empty value is interpreted as the null
context. For using the union graph one needs to deactivate the
Enable Contexts (org.apache.stanbol.entityhub.yard.sesame.enableContext
key) option. In this case all configured Context URIs will get ignored.
Configuration of the YardSite
Finally you need to configure the YardSite that uses the previously configured Yard instance (either SolrYard or ClerezzaYard). Again this will show how to configure the YardSite by using the Configuration tab of the Apache Felix Webconsole http://{stanbol-instance}/system/console/configMgr.
The above figure shows the configuration of the YardSite. The important properties are
- ID: This is the {siteId} used to map this ManagedSite to the RESTful API of the Stanbol Entityhub. Make sure that the ID is unique over all configured Sites.
- Yard ID: Here you need to put the ID of the Yard configured in the previous step. If no Yard with that ID is active the ManagedSite will not be initialized and therefore be not available on the RESTful API
The Entity Prefix(es) are an optional configuration. This is used by the SiteManager (the "/entityhub/sites" endpoint) if requested entities can be dereferenced via a registered site. If not present the SiteManager will try to dereference every request by using this ManagedSite. So correctly configuring this may slightly improve performance by avoiding unnecessary requests.
The Field Mappings can be used to copy property values of created/updates Entities to other properties. The mappings used in the above figure ensure that SKOS preferred/alternate labels, FOAF (Friend of a Friend) names, Dublin Core titles as well as the name property of the schema.org ontology are copied over to rdfs:label. This configuration is the default as the Stanbol Enhancer uses rdfs:label
as default property for linking entities based on their names.
After completing all those steps you should see a new empty ManagedSite under
http://{stanbol-instance}/entityhub/site/iptc