This project has retired. For details please refer to its Attic page.
Apache Stanbol - The Geonames Enhancement Engine

The Geonames Enhancement Engine

This engine creates fise:EntityAnnotations based on the http://geonames.org dataset. It does not directly work on the parsed content, but processes named entities extracted by some NLP (natural language processing) engine. This engine creates EntityAnnotations for Features found for named entities in the geonames.org data set. In addition it adds EntityAnnotations for the continent, country and administrative regions for entities with an high confidence level.

Processed Annotations (Input)

This engine consumes fise:TextAnnotations of type dbpedia:Place. More concretely it filters for enhancements which confirm to the following two requirements and consumes the text selected by the TextAnnotations:

?textAnnotation rdf:type fise:TextAnnotation .
?textAnnotation dc:type dbpedia:Place
?textAnnotation fise:selected-text ?text

Here an example for such an TextAnnotations selecting the text "Vienna" form the content "The community Workshop will take place in Vienna".

urn:enhancement:text-enhancement:id1
    a       fise:TextAnnotation , fise:Enhancement ;
    dc:type
         dbpedia:Place ;
    fise:selected-text
         "Vienna"^^xsd:string ;
    fise:selection-context
         "The community Workshop will take place in Vienna"^^xsd:string ;
    fise:start
         "46"^^xsd:int ;
    fise:end
         "52"^^xsd:int ;
    fise:confidence
         "0.9773640902587215"^^xsd:double ;
    fise:extracted-from
         urn:content-item:id1 .

Typically such enhancements are created by engines that provide named entity extraction based on some natural language processing framework.

Created Enhancements (Output)

The LocationEnhancementEngine creates two types of EntityAnnotations. First it suggests Entities for processed TextAnnotations and second it creates EntityAnnotations for the hierarchy of regions the suggested Entities are located in. Suggested Entities are connected with the "dc:relation" attribute to the TextAnnotation they enhance. EntityAnnotations representing the hierarchydefine a dc:requires attribute to the EntityAnnotation.

Entity Suggestions

Entity suggestions are EntityEnhancements that suggest Features of the geonames.org dataset for an processed TextAnnotation. This suggestions are currently only calculated based on the fise:selected-text of the TextAnnotation.

The following example shows three EntityAnnotations for the TextAnnotation used in the above example. See the fise:relation statements at the end of each of the two EntityAnnotations.

The first Entity found in the geonames.orf dataset is the capital city in Austria with an confidence level of 1.0:

urn:enhancement:entity-enhancement:id1
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "1.0"^^xsd:double ;
    fise:entity-label
         "Vienna"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/2761369/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPLC ;
    fise:extracted-from
         urn:content-item:id1 ;
   dc:relation
         urn:enhancement:text-enhancement:id1 .

With lower confidence levels there are a lot of other populated places with the name "Vienna" found in the geonames.org dataset.

urn:enhancement:entity-enhancement:id2
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Vienna"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/4496671/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
    fise:extracted-from
         urn:content-item:id1 ;
   dc:relation
         urn:enhancement:text-enhancement:id1 .

urn:enhancement:entity-enhancement:id3
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Vienna"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/4825976/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
    fise:extracted-from
         urn:content-item:id1 ;
    dc:relation
         urn:enhancement:text-enhancement:id1 .

Entity Hierarchy Enhancements

Entity Hierarchy Enhancements describe the regions that contain suggested Features based on the geonames.org dataset. Enhancements describing this hierarchy are added for all suggested entities with a confidence level above the value of "eu.iksproject.fise.engines.geonames.locationEnhancementEngine.min-hierarchy-score".

The default value for this property is 0.7. The hierarchy web service provided by geonames.org is used to calculate the regions. The following example shows the entity hierarchy enhancements for the suggested entity for Vienna (Autria). Please note the dc:requires relation to this EntityAnnotation at the end of each of the following enhancement.

Continent: Europe

First the enhancement for the continent Europe:

urn:enhancement:entity-hierarchy-enhancement:id1
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Europe"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/6255148/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place, geonames:L.CONT ;
    fise:extracted-from
         urn:content-item:id1 ;
   dc:requires
         urn:enhancement:entity-enhancement:id1 .

Country: Austria

Next the enhancement for the country "Austria", classified as an independent political entry within geonames.org

urn:enhancement:entity-hierarchy-enhancement:id2
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Austria"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/2782113/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.PCLI ;
    fise:extracted-from
         urn:content-item:id1 ;
    dc:requires
         urn:enhancement:entity-enhancement:id1 .

A.ADM1 - A county

Now three enhancements describing the different hierarchies of administrative regions within Austria. First the "Bundesland", next the "Stadtteil" and last the "Gemeindebezirk".

urn:enhancement:entity-hierarchy-enhancement:id3
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Vienna"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/2761367/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM1 ;
    fise:extracted-from
         urn:content-item:id1 ;
    dc:requires
         urn:enhancement:entity-enhancement:id1 .

A.ADM2 - A city

urn:enhancement:entity-hierarchy-enhancement:id4
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Politischer Bezirk Wien (Stadt)"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/2761333/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM2 ;
    fise:extracted-from
         urn:content-item:id1 ;
   dc:requires
         urn:enhancement:entity-enhancement:id1 .

A.ADM3 - A village

urn:enhancement:entity-hierarchy-enhancement:id5
    a      fise:EntityAnnotation , fise:Enhancement ;
    fise:confidence
         "0.42163702845573425"^^xsd:double ;
    fise:entity-label
         "Gemeindebezirk Innere Stadt"^^xsd:string ;
    fise:entity-reference
         http://sws.geonames.org/2775259/ ;
    fise:entity-type
         geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM3 ;
    fise:extracted-from
         urn:content-item:id1 ;
   dc:requires
         urn:enhancement:entity-enhancement:id1 .

The last two hierarchy levels are no longer valid for the meaning of "Vienna" as selected by the TextAnnotation, but added, because the geonames.org dataset locations the Feature of cities exactly in the center. However if the TextAnnotation would describe a precise address such hierarchy levels would completely make sense.

Configuration

The LocationEnhancementEngine provides six configurations

The first three can be used to optimise the behaviour of the Engine

The other three are used to configure the configured geonames.org server

HOWTO setup a free user account:

Such an account is required to be able to use the http://api.geonames.org/ server that should support better performance and higher uptime than the default free server available at http://ws.geonames.org/.

To setup the free account:

  1. go to www.geonames.org. In the right top corner you will find a "login" link that is also used to create new accounts
  2. choose a username and pwd. You will get an confirmation mail at the provided email address. When choosing the password consider, that it will be sent unencrypted (as token) with every webservice Request. Therefore it is strongly suggested to do not use an password that is used for any other account!
  3. confirm the account
  4. IMPORTANT: You need to activate the free web service for the account via http://www.geonames.org/manageaccount. Log in first, go back to this site. At the botton you should find the text "the account is not yet enabled to use the free web services. Click here to enable"

If you do not complete step (4) requests with your account will result an IOExceptions with the message

"user account not enabled to use the free webservice. Please enable it on your account page: http://www.geonames.org/manageaccount"