The Geonames Enhancement Engine
This engine creates fise:EntityAnnotations based on the http://geonames.org dataset. It does not directly work on the parsed content, but processes named entities extracted by some NLP (natural language processing) engine. This engine creates EntityAnnotations for Features found for named entities in the geonames.org data set. In addition it adds EntityAnnotations for the continent, country and administrative regions for entities with an high confidence level.
Processed Annotations (Input)
This engine consumes fise:TextAnnotations of type dbpedia:Place. More concretely it filters for enhancements which confirm to the following two requirements and consumes the text selected by the TextAnnotations:
?textAnnotation rdf:type fise:TextAnnotation . ?textAnnotation dc:type dbpedia:Place ?textAnnotation fise:selected-text ?text
Here an example for such an TextAnnotations selecting the text "Vienna" form the content "The community Workshop will take place in Vienna".
urn:enhancement:text-enhancement:id1 a fise:TextAnnotation , fise:Enhancement ; dc:type dbpedia:Place ; fise:selected-text "Vienna"^^xsd:string ; fise:selection-context "The community Workshop will take place in Vienna"^^xsd:string ; fise:start "46"^^xsd:int ; fise:end "52"^^xsd:int ; fise:confidence "0.9773640902587215"^^xsd:double ; fise:extracted-from urn:content-item:id1 .
Typically such enhancements are created by engines that provide named entity extraction based on some natural language processing framework.
Created Enhancements (Output)
The LocationEnhancementEngine creates two types of EntityAnnotations. First it suggests Entities for processed TextAnnotations and second it creates EntityAnnotations for the hierarchy of regions the suggested Entities are located in. Suggested Entities are connected with the "dc:relation" attribute to the TextAnnotation they enhance. EntityAnnotations representing the hierarchydefine a dc:requires attribute to the EntityAnnotation.
Entity Suggestions
Entity suggestions are EntityEnhancements that suggest Features of the geonames.org dataset for an processed TextAnnotation. This suggestions are currently only calculated based on the fise:selected-text of the TextAnnotation.
The following example shows three EntityAnnotations for the TextAnnotation used in the above example. See the fise:relation statements at the end of each of the two EntityAnnotations.
The first Entity found in the geonames.orf dataset is the capital city in Austria with an confidence level of 1.0:
urn:enhancement:entity-enhancement:id1 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "1.0"^^xsd:double ; fise:entity-label "Vienna"^^xsd:string ; fise:entity-reference http://sws.geonames.org/2761369/ ; fise:entity-type geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPLC ; fise:extracted-from urn:content-item:id1 ; dc:relation urn:enhancement:text-enhancement:id1 .
With lower confidence levels there are a lot of other populated places with the name "Vienna" found in the geonames.org dataset.
urn:enhancement:entity-enhancement:id2 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Vienna"^^xsd:string ; fise:entity-reference http://sws.geonames.org/4496671/ ; fise:entity-type geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ; fise:extracted-from urn:content-item:id1 ; dc:relation urn:enhancement:text-enhancement:id1 . urn:enhancement:entity-enhancement:id3 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Vienna"^^xsd:string ; fise:entity-reference http://sws.geonames.org/4825976/ ; fise:entity-type geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ; fise:extracted-from urn:content-item:id1 ; dc:relation urn:enhancement:text-enhancement:id1 .
Entity Hierarchy Enhancements
Entity Hierarchy Enhancements describe the regions that contain suggested Features based on the geonames.org dataset. Enhancements describing this hierarchy are added for all suggested entities with a confidence level above the value of "eu.iksproject.fise.engines.geonames.locationEnhancementEngine.min-hierarchy-score".
The default value for this property is 0.7. The hierarchy web service provided by geonames.org is used to calculate the regions. The following example shows the entity hierarchy enhancements for the suggested entity for Vienna (Autria). Please note the dc:requires relation to this EntityAnnotation at the end of each of the following enhancement.
Continent: Europe
First the enhancement for the continent Europe:
urn:enhancement:entity-hierarchy-enhancement:id1 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Europe"^^xsd:string ; fise:entity-reference http://sws.geonames.org/6255148/ ; fise:entity-type geonames:Feature , dbpedia:Place, geonames:L.CONT ; fise:extracted-from urn:content-item:id1 ; dc:requires urn:enhancement:entity-enhancement:id1 .
Country: Austria
Next the enhancement for the country "Austria", classified as an independent political entry within geonames.org
urn:enhancement:entity-hierarchy-enhancement:id2 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Austria"^^xsd:string ; fise:entity-reference http://sws.geonames.org/2782113/ ; fise:entity-type geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.PCLI ; fise:extracted-from urn:content-item:id1 ; dc:requires urn:enhancement:entity-enhancement:id1 .
A.ADM1 - A county
Now three enhancements describing the different hierarchies of administrative regions within Austria. First the "Bundesland", next the "Stadtteil" and last the "Gemeindebezirk".
urn:enhancement:entity-hierarchy-enhancement:id3 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Vienna"^^xsd:string ; fise:entity-reference http://sws.geonames.org/2761367/ ; fise:entity-type geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM1 ; fise:extracted-from urn:content-item:id1 ; dc:requires urn:enhancement:entity-enhancement:id1 .
A.ADM2 - A city
urn:enhancement:entity-hierarchy-enhancement:id4 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Politischer Bezirk Wien (Stadt)"^^xsd:string ; fise:entity-reference http://sws.geonames.org/2761333/ ; fise:entity-type geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM2 ; fise:extracted-from urn:content-item:id1 ; dc:requires urn:enhancement:entity-enhancement:id1 .
A.ADM3 - A village
urn:enhancement:entity-hierarchy-enhancement:id5 a fise:EntityAnnotation , fise:Enhancement ; fise:confidence "0.42163702845573425"^^xsd:double ; fise:entity-label "Gemeindebezirk Innere Stadt"^^xsd:string ; fise:entity-reference http://sws.geonames.org/2775259/ ; fise:entity-type geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM3 ; fise:extracted-from urn:content-item:id1 ; dc:requires urn:enhancement:entity-enhancement:id1 .
The last two hierarchy levels are no longer valid for the meaning of "Vienna" as selected by the TextAnnotation, but added, because the geonames.org dataset locations the Feature of cities exactly in the center. However if the TextAnnotation would describe a precise address such hierarchy levels would completely make sense.
Configuration
The LocationEnhancementEngine provides six configurations
The first three can be used to optimise the behaviour of the Engine
- Minimum score (default = 0.33): The minimum score (confidence) that is required for entity suggestions
- Maximum Locations (default = 3): The maximum numbers of entity suggestions added (regardless if there would be more results with a score > min-score.
- Maximum Locations (default = 0.7): The minimum score (confidence) that is required that hierarchy enhancements are added for an suggested entity. To add hierarchy enhancements for all suggested entities min-hierarchy-score needs to be set to a value smaller equals than min-score.
The other three are used to configure the configured geonames.org server
- geonames.org Server: The URL of the geonames.org service. The default is the free geonames.org webserver that works without user authentication. There is a second free server at http://api.geonames.org/ that requires to setup a free user account. Users with a premium account will require to add here there own URL
- User Name: Thats the name of the account (can be empty if the configured server does not require user authentication
- Token: The token is usually the password of the user account.
HOWTO setup a free user account:
Such an account is required to be able to use the http://api.geonames.org/ server that should support better performance and higher uptime than the default free server available at http://ws.geonames.org/.
To setup the free account:
- go to www.geonames.org. In the right top corner you will find a "login" link that is also used to create new accounts
- choose a username and pwd. You will get an confirmation mail at the provided email address. When choosing the password consider, that it will be sent unencrypted (as token) with every webservice Request. Therefore it is strongly suggested to do not use an password that is used for any other account!
- confirm the account
- IMPORTANT: You need to activate the free web service for the account via http://www.geonames.org/manageaccount. Log in first, go back to this site. At the botton you should find the text "the account is not yet enabled to use the free web services. Click here to enable"
If you do not complete step (4) requests with your account will result an IOExceptions with the message
"user account not enabled to use the free webservice. Please enable it on your account page: http://www.geonames.org/manageaccount"