This project has retired. For details please refer to its Attic page.
Apache Stanbol - The Language Detection Engine

The Language Detection Engine

The LangDetect engine determines the language of text.

Technical Description

The provided engine is based on the language identifier of language-detection project.

The plain text needed for the detection is retrieved from the processed ContentItem by searching a Blob with the media type "text/plain".

The result of language identification is added as fise:TextAnnotation to the content item's metadata as string value of the property

http://purl.org/dc/terms/language

This RDF snippet illustrates the output:

<fise:TextAnnotation rdf:about="urn:enhancement-a147957b-41f9-58f7-bbf1-b880b3aa4b49">
    <dc:language>en</dc:language>
    <fise:confidence>0.99987</fise:confidence>
    <dc:type rdf:resource="http://purl.org/dc/terms/LinguisticSystem"/>
    <dc:creator>org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine</dc:creator>
</fise:TextAnnotation>

The list of supported languages is available here.

Configuration options