RESTful Language Identification Service
STANBOL-894 added a standard RESTful Language Identification service that can be used to integrate NLP processing frameworks that do support Language Identification.
On the Stanbol Enhancer side the service is consumed by the RESTful Language Identification Engine meaning that integrators of the Language Identification functionality do only need to take care of implementing the RESTful service.
This option of integrating an NLP framework with the Stanbol Enhancer should be considered in the following scenarios:
- NLP Frameworks that are not implemented in Java: As this allows integrators to implement the RESTful service in the programming language of their choice.
- Avoid OSGI: All utilities provided by Apache Stanbol do work inside and outside an OSGI environment.
- NLP Frameworks under licenses with strong copy left such as GPL and AGLP: Integrating a NLP framework as NLP EnhancementEngine means linking against the API of the NLP framework, what is an problem for users with none open source extensions to Apache Stanbol. Integrating such Frameworks as a standalone server that provide a RESTful service does not suffer this problem.
- Crashes of the NLP framework integration does not affect Stanbol: Especially for NLP frameworks that do use native libraries any exception may cause the JVM to crash.
- Distribution: Integration over RESTful services allows to distribute NLP processing task on different servers.
RESTful Service specification
- Method: POST {service-baseuri}
- Request Headers:
- Content-Type: Must be
plain/text; charset={charset}
. If the charset parameter is missing thatUTF-8
is used as default.
- Content-Type: Must be
- Response: The JSON serialized Information about the detected Languages (see specification below)
JSON Representation for Detected Languages
The detected languages are encoded as an JSON Array. Each Element of the array needs to define the "lang" attribute with a string value representing the language and an optional "prob" attribute with an numerical value representing the probability.
Example
A POST request with a Content-Language
header and plain/text
as content
curl -i -X POST -H "Content-Type: text/plain" -T en.txt http://localhost:8080/langident
will return an JSON array with the detected languages
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked Server: Jetty(6.0.x)
[{ lang:"en", prob:0.907 },{ lang:"fr", prob:0.532 },{ lang:"it", prob:0.384 }]