Skip to content

Latest commit

 

History

History

benchmark

Benchmarks for sdsl data structures

This directory contains a set of benchmarks for sdsl data structures. Each benchmark is in its own subdirectory and so far we have:

  • indexing_count: Evaluates the performance of count queries on different FM-Indexes/CSAs. Count query means How many times occurs my pattern P in the text T?
  • indexing_extract: Evaluates the performance of extracting continues sequences of text out of FM-Indexes/CSAs.
  • indexing_locate: Evaluates the performance of locate queries on different FM-Indexes/CSAs. Locate query means At which positions does pattern P occure in T?
  • rrr_vector: Evaluates the performance of the H_0-compressed bitvector rrr_vector. Operations access, rank, and select are benchmarked on different inputs.
  • wavelet_trees: Evaluates the performance of wavelet trees.

You can executed the benchmarks by calling make timing in the specific subdirectory. Test inputs will be automatically generated or downloaded from internet sources, such as the excellent Pizza&Chili website, and stored in the data directory.

Directory tmp is used to store temporary files (like plain suffix arrays) which are used to generate compressed structures.

Prerequisites

The following tools, which are available as packages for Mac OS X and most Linux distributions, are required:

  • cURL is required by the test input download script.
  • gzip is required to extract compressed files.

Literature

The benchmark code originates from the following article and can be used to easily reproduce the results presented in the paper.

Simon Gog, Matthias Petri: Optimized Succinct Data Structures for Massive Data. 2013. Accepted for publication in Software, Practice and Experience. Preprint

Author

Simon Gog (simon.gog@gmail.com)