This project has retired. For details please refer to its Attic page.
Apache Stanbol - In-Memory AnalyzedText and Annotation implementation

In-Memory AnalyzedText and Annotation implementation

This describes the implementation of the Analyzed Text used by default by the Stanbol NLP processing module. This implementation is directly contained within the org.apache.stanbol.enhancer.nlp module.


The AnalyzedTextFactory of the in-memory implementation registers itself as OSGI service with an "service.ranking" of Integer.MIN_VALUE. That means that any other registered AnalyzedTextFactory will override this one (unless it does not use Integer.MIN_VALUE itself).

The implementation uses the ContentItemHelper#getText(Blob blob) method to retrieve the text from the parsed blob. The text is than used to create an AnalyzedText instance.

AnalyzedText Implementation

The in-memory implementation is based on a NavigableMap that uses the same span as both key and value. TreeMap is currently used as implementation. The compareTo(..) method of the Span implementation ensures the correct ordering of Spans as specified by the Analyzed Text interface. All add**(..) methods first check if a span with the added type, [start,end) is already contained. If this is the case the current span is returned otherwise an new instance is created.

The Iterator implementation is not based on the Iterators provided by the NavigableMap as those would throw ConcurrentModificationExceptions - what is prohibited by the specification. Instead in implementation that is based on the #higherKey() method is used. Filtered Iterators are implemented using Apache Commons Collections FilteredIterator utility with an Predicate based on the SpanTypeEnum.

Annotation Implementation

The implementation of the Annotated interface is similar to that of the SolrInputDocument. Internally it uses a Map to store data. When a single value is added it is directly store in the map. In case of multiple values data are stored in Arrays. Arrays are sorted by an comparator that ensures that the value with the highest probability is at index '0'.

Type safety is not checked so creating multiple Annotations with different value types that share the same key will cause ClassCastExceptions at runtime.