A manuscript discussing this software by Eric Roberts, and Michael M Hoffman is in preparation. A preprint of the paper with an updated citation will be updated here when available.
Newmap is a software package that efficiently identifies uniquely mappable regions of any genome. It accomplishes this task by outputting read lengths at every position that are unique to that genome. From the range of unique read lengths produced, the single-read mappability and the multi-read mappability for a specific read length can be generated.
Newmap can search for unique k-mer/read lengths on specific values, or entire continuous ranges using a binary search method allowing for finding the minimum possible unique k-mer/read length.
Newmap requires a CPU that supports the AVX2 instruction set.
Newmap requires Python 3.9 or later. It also requires numpy and the AvxWindowFMIndex library both of which are installed automatically when using the methods below.
Currently only Linux is supported, but it may be possible to build and run on other operating systems. Notably a compiler with OpenMP support is required for parallel processing. See the documentation for more details on building from source.
The latest documentation for Newmap is available on Read the Docs.
All commands have a --help
option to provide additional usage information.
pip install newmap
conda install bioconda::newmap
You can download a test genome from the Newmap repository to follow along with the usage example below.
curl -sL https://raw.githubusercontent.com/hoffmangroup/newmap/refs/heads/master/tests/data/genome.fa > genome.fa
To speed up creating the index in the following step, it is recommended to use
options --seed-length=1
and --compression-ratio=1
specifically for the very
small test genome. Otherwise it would be recommended to use the default values.
newmap index genome.fa
By default this will create a genome.awfmi
file in the current directory.
Searching the entire genome, using 20 threads, printing status information, and searching lengths ranging from 20 to 200 bp:
newmap search --verbose --num-threads=20 --search-range=20:200 --output-directory=unique_lengths genome.fa
This will create *.unique.uint8
files (one for each sequence ID) in the unique_lengths
directory.
To output single-read and multi-read mappability for a 24 bp read length:
newmap track --single-read=24.bed --multi-read=24.wig 24 unique_lengths/*.unique.uint8
For both single-read and multi-read mappability, this will generate a single
file that contains the mappability for all sequences listed in the
unique_lengths
directory.
The resulting BED file will be the single read mappability, and the WIG file
will be the multi-read mappability.
Newmap is a reimplementation of the output of Umap. Umap was developed by Mehran Karimzadeh. The repository for that implmemention is found at https://www.github.com/hoffmangroup/umap. Umap in turn was originally developed by Anshul Kundaje and was written in MATLAB. The original repository is available https://sites.google.com/site/anshulkundaje/projects/mappability.
This project uses the excellent AvxWindowFMIndex library. Read their published article here (https://doi.org/10.1186/s13015-021-00204-6)