Skip to content

U-Ar/full-br-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

full-br-index

br-index (https://github.com/U-Ar/br-index) also supporting pattern contractions.

About

This repository provides the fully functional bi-directional r-index (full br-index).

The r-index is the compressed text index which supports efficient count(P) and locate(P). Its uses O(r) words of space, whose r is the number of equal-letter runs in BWT of the text.

The br-index is the bi-directional text index based on the r-index, supporting left-extension, right-extension, and locate at any step of the search. It consists of the r-index on the text, the r-index on the reversed text, and the permuted LCP array (PLCP).

The full br-index in this repository is the enhanced version of the br-index with additional data structures. In addition to extensions and locate, it supports left-contraction and right-contraction, which are the inverse operations of left-extension and right-extension. There are time-space trade-offs based on the value of the parameter bl. The recommended value is around 15, and it is set to 8 by default. The supports of the five key operations enable complex searches. As an example we implement the Maximal Exact Matches query in src/bri-query.cpp.

The index is constructed by the Prefix-Free Parsing method. Suitable for highly repetitive huge text collections. The rather simple in-memory construction is also supported (-i option for bri-build), but it consumes memory.

System Requirements

  • This project is based on sdsl-lite library. Install sdsl-lite beforehand and modify variables SDSL_INCLUDE and SDSL_LIB in CMakeLists.txt.

  • This project has been tested under gcc 7.5.0, gcc 8.4.0 and gcc 12.1.0.

How to Use

Firstly, clone the repository. Since a submodule is used (iutest), recursive cloning is necessary.

git clone --recursive https://github.com/U-Ar/full-br-index.git

In order to build, execute following commands: (This project is using CMake)

mkdir build
cd build
cmake ..
make

You can also run unit tests and integration tests.

make unit-tests
make integration-tests

For the detail of the tests, refer to test/README.md.

By default 5 entry-point executables will be created in the build directory.

bri-build (Python script)
Builds the br-index on the input text file using Prefix-Free Parsing. Using -t option is not recommended now: it causes errors during the computation of multi-threaded PFP.
bri-query
Computes searching queries on the index. (count, locate, MEMs, full-task)
bri-space
Shows the statistics of the text and the breakdown of the index space usage.
unit-test
runs unit tests.
integration-test (Python script)
runs integration tests.

The codes used for experiments are archived in the experiment_archive directory.

Citations

For the br-index (without contractions), cite

  • Arakawa, Y., Navarro, G., & Sadakane, K. (2022). Bi-Directional r-Indexes. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

The fully-functional br-index: not published.

About

Compressed text index supporting complex queries, based on RLBWT.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published