br-index (https://github.com/U-Ar/br-index) also supporting pattern contractions.
This repository provides the fully functional bi-directional r-index (full br-index).
The r-index is the compressed text index which supports efficient count(P) and locate(P). Its uses O(r) words of space, whose r is the number of equal-letter runs in BWT of the text.
The br-index is the bi-directional text index based on the r-index, supporting left-extension, right-extension, and locate at any step of the search. It consists of the r-index on the text, the r-index on the reversed text, and the permuted LCP array (PLCP).
The full br-index in this repository is the enhanced version of the br-index with additional data structures. In addition to extensions and locate, it supports left-contraction and right-contraction, which are the inverse operations of left-extension and right-extension. There are time-space trade-offs based on the value of the parameter bl. The recommended value is around 15, and it is set to 8 by default. The supports of the five key operations enable complex searches. As an example we implement the Maximal Exact Matches query in src/bri-query.cpp.
The index is constructed by the Prefix-Free Parsing method. Suitable for highly repetitive huge text collections. The rather simple in-memory construction is also supported (-i option for bri-build), but it consumes memory.
-
This project is based on sdsl-lite library. Install sdsl-lite beforehand and modify variables SDSL_INCLUDE and SDSL_LIB in CMakeLists.txt.
-
This project has been tested under gcc 7.5.0, gcc 8.4.0 and gcc 12.1.0.
Firstly, clone the repository. Since a submodule is used (iutest), recursive cloning is necessary.
git clone --recursive https://github.com/U-Ar/full-br-index.git
In order to build, execute following commands: (This project is using CMake)
mkdir build
cd build
cmake ..
make
You can also run unit tests and integration tests.
make unit-tests
make integration-tests
For the detail of the tests, refer to test/README.md.
By default 5 entry-point executables will be created in the build directory.
- bri-build (Python script)
- Builds the br-index on the input text file using Prefix-Free Parsing. Using -t option is not recommended now: it causes errors during the computation of multi-threaded PFP.
- bri-query
- Computes searching queries on the index. (count, locate, MEMs, full-task)
- bri-space
- Shows the statistics of the text and the breakdown of the index space usage.
- unit-test
- runs unit tests.
- integration-test (Python script)
- runs integration tests.
The codes used for experiments are archived in the experiment_archive directory.
For the br-index (without contractions), cite
- Arakawa, Y., Navarro, G., & Sadakane, K. (2022). Bi-Directional r-Indexes. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
The fully-functional br-index: not published.