Stanbol Enhancer Stress Test Utility

As of STANBOL-670 Apache Stanbol provides an utility that allows users to stress test the Stanbol Enhancer by using multiple concurrent requests. This might be useful for both:

Stanbol Users that want to check if their Stanbol installation can cope with those situations and how different Enhancement Chain configurations do affect processing times.
Enhancement Engine developers that want to test their engines and maybe also services called by those engines.

In addition this Utility also provides some statistics including

Round Trip Time: The whole request/response time including sending - request transmission - server side parsing - processing - server side serialization - response transmission and client side parsing.
Enhancement Chain processing: This is the time needed by the EnhancementJobManager to process the ContentItem. This data are provided by the Execution Metadata
EnhancementEngine processing: Also the processing times of all Enhancement Engines used in the tested Enhancemet Chain are tracked. Those data are also provides by the Execution Metadata

Usage

This utility is part of the Apache Stanbol Integration tests and is also run during normal builds against the default chain of the Stanbol Enhancer. As any integration test it can be also run standalone and against Stanbol Servers running at a configured URL.

To use this tool you need to checkout and build Apache Stanbol and than change to the {stanbol-source}/integration-tests directory. Within this directory one can now call this utility using

mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest

this will make 500 requests with 5 concurrent threads on the {stanbol-server} using DBpedia.org abstracts as content. The integration-test includes up to 10000 those abstracts that can be used for testing.

This utility can be configured using the following system properties:

test.server.url: The URL of the Apache Stanbol instance that will be used for testing (e.g. http://localhost:8080)
test: The simple class name of the integration test to run. To use this tool this MUST BE set to ' MultiThreadedTest'. *stanbol.it.multithreadtest.chain: The name of the enhancement chain to test. If not present the default chain will be tested.
stanbol.it.multithreadtest.data: Allows to specify the data used for the tests. Files, Resources available via the class path and URLs are supported. Referenced data may be compressed using 'gz' and 'bz2'. 'zip' is also supported however only the first entry of the ZIP file is processed. Supported data formats include plain text and RDF serializations supported by Apache Clerezza. See the section about Test Data for details.
stanbol.it.multithreadtest.media-type: While the Tool supports auto-detection of the 'Media-Type' for common file endings (e.g. .txt, .rdf, …) this property can be used to manually specify the media type. In addition it allows to parse the charset used for plain text files (e.g. "text/plain;charset=UTF-8)
stanbol.it.multithreadtest.data-property: In case RDF is used for test data this can be used to specify the property of triples their values are used as test data. If '*' is parsed all triples with Literals as values will be used. 'http://dbpedia.org/ontology/abstract' is used as default if this property is missing.
stanbol.it.multithreadtest.threads: The number of concurrent threads used during stress testing. The default is 5.
stanbol.it.multithreadtest.requests: The maximum number of requests. This only applies if the configured data set would provide more data items. By default this is set to 500. This can be deactivated by setting to values less equals than 0.
stanbol.it.multithreadtest.rdf-format: The RDF serialization used for the 'Accept' header in enhancement requests. Apache Stanbol will send Enhancement Results using this format. The default is 'application/rdf+json'

Here is an example that makes extensive use of custom options:

mvn -o test -Dtest=MultiThreadedTest \
    -Dstanbol.it.multithreadtest.data=/stanbol/test/data/stanbol-test-data.txt.gz \
    -Dstanbol.it.multithreadtest.requests=10000 \
    -Dstanbol.it.multithreadtest.threads=20 \
    -Dstanbol.it.multithreadtest.rdf-format=text/turtle \
    -Dtest.server.url=http://www.example.org:8080/stanbol

NOTES:

With Java System properties are parsed using '-D{property}={value}'
If you get OutOfMemory errors you might need to increase the memory of the 'Xmx' parameter of the 'MAVEN_OPTS' system variable. This might especially happen if you use RDF data for your test as those are loaded into memory.

Supported Test Data Formats

This tool supports two different test data formats and also is able to read compressed filed. The following three sub sections provide detailed information.

Plain Text Files

All test data are within a single text file. Single texts are separated by two (or more) empty lines.

The following example includes three content items:

Astronomers discover largest star on record\n
\n
European astronomers have discovered the largest star yet on record; 
it is 300 times the mass of our sun, beyond the previously accepted 
limit of 150 solar masses.\n
\n
Paul Crowther, professor of astrophysics at […]\n
\n
\n
Australian election debate moved to avoid clash with cookery show\n
\n
A televised debate between Australia's candidates for Prime Minister […]\n
\n
\n
The Only Joy In Town\n
\n
by Joni Mitchell\n   
\n
I want to paint a picture\n
Botticelli * style\n
Instead of Venus on a clam *\n
I'd paint this flower child\n

Plain text test data are read sequentially from the provided source. This ensures that only ~100 content items are loaded into memory at any given time. So this is the preferred option for large test data sets.

Text files can recognized by the file ending "txt" to the parsed resource. For resources with other engines the property 'stanbol.it.multithreadtest.media-type=text/plain' must be specified. If the test data are not encoded using 'UTF-8' the charset MUST BE parsed by using the 'charset' parameter (e.g. 'stanbol.it.multithreadtest.media-type=text/plain;charset=iso-8859-7').

RDF data

The tool also allows to use RDF graphs as test data. This is mainly because in a lot of cases it is the easiest to use RDF dumps of public datasets - such as DBpedia.org - for testing. Users need to be aware that RDF data are imported into an in-memory graph.

Content Items are extracted by

Filtering Triples that use the value configured by 'stanbol.it.multithreadtest.data-property' as property ('{prefix}:{local-name}' is supported for registered prefixes). As default 'http://dbpedia.org/ontology/abstract' is used. If '*' is configured than all triples are taken into account.
Filter Triples that have a Literal value as Object

Supported RDF formats and mapped file endings:

'application/rdf+xml' - file endings '.rdf' and '.xml'
'text/turtle' - file ending '.ttl'
'application/x-turtle' - no file endings
'text/rdf+nt' - file endings '.nt'
'text/rdf+n3' - file endings '.n3'
'application/rdf+json' - file endings '.json'

If you want to use a different file ending you need to parse the Media-Type using the 'stanbol.it.multithreadtest.media-type' property

Support for compressed test data

Bot plain text and RDF data can be efficiently compressed. Because of that this utility also supports compressed files. The compression format is detected by the file ending.

Supported are

'.gz'
'.bz2'
'.zip' - only the first entry in the ZIP archive is processed

Compressed files need to use double endings (e.g. 'test-data.txt.gz' or 'test-data.rdf.bz2').

Downloads

Project

Archived Docs

The ASF