This project has retired. For details please refer to its Attic page.
Apache Stanbol - The Named Entity Recognition Engine: detect Named Entities from unstructured text content

The Named Entity Recognition Engine: detect Named Entities from unstructured text content

This engine is based on the NLP features of Apache OpenNLP (incubating). It uses its Maximum Entropy models to detect Persons, Names and Organizations.

(TODO: features, configuration if possible)

Example Result

This engine adds TextAnnotation-Enhancements for the text "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.", (amongst other) the following information to the enhancement graph, suggesting Bob Marley (of type: Person) for the string "Bob Marley":

{
  "@subject": "urn:enhancement-b3d4617d-1760-0374-f471-e0e746003f4e",
      "@type": [ "enhancer:Enhancement","enhancer:TextAnnotation"],
      "dc:created": "2012-02-29T11:34:56.369Z",
      "dc:creator": "org.apache.stanbol.enhancer.engines.opennlp.impl.NEREngineCore",
      "dc:type": "dbp-ont:Person",
      "enhancer:confidence": 0.94647044,
      "enhancer:end": 89,
      "enhancer:extracted-from": "urn:content-item-sha1-37c8a8244041cf6113d4ee04b3a04d0a014f6e10",
      "enhancer:selected-text": "Bob Marley",
      "enhancer:selection-context": 
      "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.",
      "enhancer:start": 79
}

This enhancement statement provides you with the ID and date of the enhancement, the suggested type with a confidence for it, the position of the selected text and its (sentence) context as well as the link to the source document.