In-Memory AnalyzedText and Annotation implementation
This describes the implementation of the Analyzed Text used by default by the Stanbol NLP processing module. This implementation is directly contained within the org.apache.stanbol.enhancer.nlp module.
The AnalyzedTextFactory of the in-memory implementation registers itself as OSGI service with an "service.ranking" of Integer.MIN_VALUE. That means that any other registered AnalyzedTextFactory will override this one (unless it does not use Integer.MIN_VALUE itself).
The implementation uses the ContentItemHelper#getText(Blob blob) method to retrieve the text from the parsed blob. The text is than used to create an AnalyzedText instance.
The in-memory implementation is based on a NavigableMap that uses the same span as both key and value. TreeMap is currently used as implementation. The compareTo(..) method of the Span implementation ensures the correct ordering of Spans as specified by the Analyzed Text interface. All add**(..) methods first check if a span with the added type, [start,end) is already contained. If this is the case the current span is returned otherwise an new instance is created.
The Iterator implementation is not based on the Iterators provided by the NavigableMap as those would throw ConcurrentModificationExceptions - what is prohibited by the specification. Instead in implementation that is based on the #higherKey() method is used. Filtered Iterators are implemented using Apache Commons Collections FilteredIterator utility with an Predicate based on the SpanTypeEnum.
The implementation of the Annotated interface is similar to that of the SolrInputDocument. Internally it uses a Map
Type safety is not checked so creating multiple Annotations with different value types that share the same key will cause ClassCastExceptions at runtime.