This project has retired. For details please refer to its Attic page.
Apache Stanbol - The Language Detection Engine

The Language Detection Engine

The LangDetect engine determines the language of text.

Technical Description

The provided engine is based on the language identifier of language-detection project.

The plain text needed for the detection is retrieved from the processed ContentItem by searching a Blob with the media type "text/plain".

The result of language identification is added as fise:TextAnnotation to the content item's metadata as string value of the property

This RDF snippet illustrates the output:

<fise:TextAnnotation rdf:about="urn:enhancement-a147957b-41f9-58f7-bbf1-b880b3aa4b49">
    <dc:type rdf:resource=""/>

The list of supported languages is available here.

Configuration options