Co-Mention Engine

The Co-Mention engine aims to link initial mentions of Entities with later references in the Text.

The typical example are persons only mentioned by their family name after an initial mention with the full name e.g.

... Barack Obama gave a talk to members of the Labor Union ... Obama specially mentioned ...

NOTE: This Engine does NOT provide/use NLP co-reference support (e.g. linking a Pronoun with the Entity it stands for). Its purpose it to (1) link follow up mentions of Entities with the original one and (2) add suggestion of the initial mention to follow up mentions.

Configuration

As this engine does use entity linking functionality of the EntityLinkingEngine its configuration supports similar properties.

Name (stanbol.enhancer.engine.name): The name of the Enhancement Engine. This name is used to refer an EnhancementEngine in EnhancementChains
ServiceRankging (service.ranking): In case multiple enhancement engines do use the same name, than only the one with the higher ranking will get uses.
Case Sensitivity (enhancer.engines.linking.caseSensitive): Boolean switch that allows to activate/deactivate case sensitive matching. It is important to understand that even with case sensitivity activated an Entity with the label such as "Anaconda" will be suggested for the mention of "anaconda" in the text. The main difference will be the confidence value of such a suggestion as with case sensitivity activated the starting letters "A" and "a" are NOT considered to be matching. See the second technical part for details about the matching process. Case Sensitivity is deactivated by default. It is recommended to be activated if controlled vocabularies contain abbreviations similar to commonly used words e.g. CAN for Canada.
Proper Noun Linking (enhancer.engines.linking.properNounsState): Enables/Disables proper noun linking for searching co-mentions. By default this is disabled to also consider Commons Nouns when searching for co-mentions. However for Vocabularies that only contain Proper Nouns (Persons, Organizations, ...) enabling this might be useful. For the full documentation of this feature see the Text Processing Configuration section of the EntityLinking engine.
Processed Languages (enhancer.engines.linking.processedLanguages): Allows the detailed configuration on how NLP processing results should be consumed by the Co-Mention engine. For the full documentation of this feature see the Text Processing Configuration
Adjust Existing Confidence (enhancer.engines.comention.adjustExistingConfidence): If the Engine engine detect a co-mention for an existing fise:TextAnnotation it can adjust confidence values for existing suggestions. This property will take values in the range [0..1). Confidence values of existing suggestions will be multiplied with 1-{value}. Configuring 0.0 deactivates this feature. The default is 0.33. See STANBOL-1219 for details and an example.

Other supported properties that are not included in the Felix Webconsole configuration dialog. Those properties can only be set via OSGI configuration files. See the Entity Linking Engine configuration for the full documentation of those properties

Min Search Token Length (enhancer.engines.linking.minSearchTokenLength)
Minimum Token Match Score (enhancer.engines.linking.minTokenScore)
Lemma based Matching (enhancer.engines.linking.lemmaMatching)
Max Search Token Distance (enhancer.engines.linking.maxSearchTokenDistance)
Max Search Tokens (enhancer.engines.linking.maxSearchTokens)

The following properties of the EntityLinking engine are ignored:

Type Mappings (enhancer.engines.linking.typeMappings): The Co-Mention engine uses the dc:types of the initial mention. Therefore dc:Type mappings need not to be specified
Default Matching Language (enhancer.engines.linking.defaultMatchingLanguage): The engine uses the language as detected for the parsed document for matching.
Redirect Field (enhancer.engines.linking.redirectField) and Redirect Mode (enhancer.engines.linking.redirectMode): The engine uses suggestions of the initial mention. Redirects where already processed for those suggestions. Therefore the Co-Mention engine does not need to deal with redirects.
Label Field (enhancer.engines.linking.labelField): The engine uses the initial mention as label to search for co-mentions. Because of theta no label field needs to be configured.
Type Field (enhancer.engines.linking.typeField): The engine uses the types of the suggestions for the initial mentions.
Suggestions (enhancer.engines.linking.suggestions): The Co-Mentions Engine adds all suggestions of the initial mention to co-mentions.
Min Matched Tokens (enhancer.engines.linking.minFoundTokens) is set to '1' meaning that at least a single token of the initial mention needs to match co-mentions.
Min Label Score (enhancer.engines.linking.minLabelScore) is set to '1/4' meaning that at least 1/4 of the tokens for the initial mention need to be present in co-mentions.
Min Match Score (enhancer.engines.linking.minMatchScore) is set to a value so that it does not filter any results.

Downloads

Project

Archived Docs

The ASF

Co-Mention Engine

Configuration