Skip to content

Commit f00f9ab

Browse files
committed
First commit
0 parents  commit f00f9ab

File tree

10 files changed

+539
-0
lines changed

10 files changed

+539
-0
lines changed

README.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# m6anormalization
2+
3+
This repository provides a package for calculating k-mer normalization
4+
constants for m6a levels inferred from DNA Nanopore reads. For more
5+
detailed information, refer to the pulication:
6+
7+
[Simultaneous Profiling of Chromatin Accessibility and DNA Methylation
8+
in Complete Plant Genomes Using Long-Read
9+
Sequencing](https://www.biorxiv.org/content/10.1101/2023.11.15.567180v2).
10+
11+
<figure>
12+
<p align="center">
13+
<img src="img/kmer-norms.png" alt="m6a normalization constants" height="400" style="vertical-align:middle"/>
14+
</p>
15+
</figure>
16+
17+
## Requirements
18+
19+
Before installation, ensure you have Python 3.x and Conda installed on your system. The package installation will automatically manage Pip and Numpy dependencies.
20+
21+
## Installation
22+
23+
24+
### Create conda environment
25+
26+
It is highly recommendable to create a dedicated Conda environment.
27+
28+
```shell
29+
conda create -n m6a python=3.6
30+
conda activate m6a
31+
```
32+
33+
### Install pip
34+
35+
If [pip](https://pypi.org/project/pip/) is not already installed, use
36+
the following command
37+
38+
```shell
39+
conda install pip
40+
```
41+
42+
### Clone the repository and install the package
43+
44+
```shell
45+
git clone https://github.com/aedera/m6anormalization
46+
cd m6anormalization
47+
pip install .
48+
```
49+
50+
## Usage
51+
52+
### Process m6a calls
53+
54+
To process m6A calls from Nanopore reads, use the
55+
[megalodon](https://github.com/nanoporetech/megalodon) tool. The
56+
output file per_read_modified_base_calls.txt is required for
57+
generating k-mer normalization constants. Execute the following
58+
commands to process it before constant calculation:
59+
60+
```shell
61+
awk '$7=="Y" {if(exp($5)>=0.75) { print $2"\t"$4"\t"$4"\t"1"\t"$3} else if(exp($5)<0.75) print $2"\t"$4"\t"$4"\t"0"\t"$3}' per_read_modified_base_calls.txt > bper_read.tmp
62+
sort -k 1,1 -k2,2n -T bper_read.tmp > bper_read.sorted.tmp
63+
bedtools merge -i bper_read.sorted.tmp -c 4,4,5 -o count,mean,distinct -d -2 | \
64+
awk '{if($3==$2) { print $1"\t"$2"\t"$3+1"\t"$5"\t"$6} else print $1"\t"$2+1"\t"$2+2"\t"$5"\t"$6}' > m6a.bed
65+
```
66+
67+
These commands discretize per-read m6A calls and aggregates them to
68+
derive methylation levels per genomic adenine. These methylation
69+
levels are stored in the `m6a.bed` file. Per-read m6a calls are
70+
discretized using 0.75 as a threshold.
71+
72+
The `m6a.bed` file should look like this
73+
74+
```
75+
Chr1 32 33 0.100 - 20
76+
Chr1 34 35 0.071 + 14
77+
Chr1 35 36 0.067 + 15
78+
Chr1 36 37 0.174 - 23
79+
Chr1 39 40 0.125 - 24
80+
Chr1 40 41 0.083 - 24
81+
Chr1 41 42 0.000 + 16
82+
Chr1 42 43 0.000 + 17
83+
Chr1 43 44 0.125 - 24
84+
Chr1 47 48 0.083 - 24
85+
Chr1 48 49 0.000 + 18
86+
Chr1 49 50 0.000 + 18
87+
Chr1 50 51 0.000 + 18
88+
```
89+
90+
where the columns indicate
91+
92+
|chromosome|start|end|m6a level|strand|coverage|
93+
|----------|-----|---|---------|------|--------|
94+
95+
96+
### Generate k-mer normalization constants
97+
98+
Use the `m6a.bed` file to generate k-mer normalization constants:
99+
100+
```shell
101+
m6anormalization generate --bed m6a.bed \
102+
--fas fas_file \
103+
--chrs Chr1,Chr2,Chr3,Chr4,Chr5 \
104+
--out kconstants.tsv
105+
```
106+
107+
`fas_file` refers to the fasta file used as input for megalodon. The k-mers are extracted from the chromosomes indicated in `--chrs`.
108+
109+
### Apply normalization constants to genome
110+
111+
Apply the generated normalization constants (`kconstants.tsv`) to
112+
normalize the m6a levels:
113+
114+
```shell
115+
m6anormalization apply --norm kconstants.tsv \
116+
--fas fas_file \
117+
--bed m6a.bed
118+
--chrs Chr1,Chr2,Chr3,Chr4,Chr5
119+
```
120+
121+
This step produces a bed file per chromosome with normalized m6a
122+
levels with the following format:
123+
124+
|chromosome|start|end|m6a level|strand|coverage|k-mer|normalized m6a level|
125+
|----------|-----|---|---------|------|--------|-----|--------------------|
126+
127+
## Contributing
128+
129+
Contributions from anyone are welcome. You can start by adding a new entry [here](https://github.com/aedera/m6anormalization/issues).
130+
131+
132+
## License
133+
134+
This package is released under the [MIT License](LICENSE).

img/kmer-norms.png

381 KB
Loading

m6anormalization/__init__.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
2+
3+
__version__ = '0.1.0'
4+
5+
from m6anormalization.cli import (
6+
generate, apply,
7+
)
8+
9+
modules = [
10+
'generate', 'apply'
11+
]
12+
13+
def main():
14+
parser = ArgumentParser(
15+
'm6anormalization',
16+
formatter_class=ArgumentDefaultsHelpFormatter
17+
)
18+
19+
parser.add_argument(
20+
'-v', '--version', action='version',
21+
version='%(prog)s {}'.format(__version__)
22+
)
23+
24+
subparsers = parser.add_subparsers(
25+
title='subcommands', description='valid commands',
26+
help='additional help', dest='command'
27+
)
28+
subparsers.required = True
29+
30+
for module in modules:
31+
mod = globals()[module]
32+
p = subparsers.add_parser(module, parents=[mod.argparser()])
33+
p.set_defaults(func=mod.main)
34+
35+
args = parser.parse_args()
36+
args.func(args)
37+
38+
39+

m6anormalization/cli/__init__.py

Whitespace-only changes.

m6anormalization/cli/apply.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import argparse
2+
import multiprocessing
3+
4+
from m6anormalization.utils.io import read_fas
5+
from m6anormalization.utils.utils import get_kmer
6+
7+
# import pdb
8+
9+
# class ForkedPdb(pdb.Pdb):
10+
# """A Pdb subclass that may be used
11+
# from a forked multiprocessing child
12+
13+
# """
14+
# def interaction(self, *args, **kwargs):
15+
# _stdin = sys.stdin
16+
# try:
17+
# sys.stdin = open('/dev/stdin')
18+
# pdb.Pdb.interaction(self, *args, **kwargs)
19+
# finally:
20+
# sys.stdin = _stdin
21+
22+
class Writer(multiprocessing.Process):
23+
def __init__(self, knorm_fn, fas_fn, bed_fn, chrn):
24+
super().__init__()
25+
self.knorm_fn = knorm_fn
26+
self.fas_fn = fas_fn
27+
self.bed_fn = bed_fn
28+
self.tchrn = chrn
29+
30+
def run(self):
31+
# read kmer normalization
32+
k2v = {}
33+
with open(self.knorm_fn) as f:
34+
for line in f:
35+
kmer, val = line.strip().split('\t')
36+
k2v[kmer] = float(val)
37+
38+
# read fasta file to retrieve k-mers associated to positions
39+
seq = read_fas(self.fas_fn, self.tchrn)
40+
41+
# process input bed file
42+
out = open(f'{self.tchrn}.bed', 'w')
43+
# process input bed file
44+
with open(self.bed_fn) as f:
45+
for line in f:
46+
chrn, start, stop, mlevel, strand, cov = line.strip().split('\t')
47+
if chrn != self.tchrn:
48+
continue
49+
start = int(start)
50+
mlevel = float(mlevel)
51+
52+
kmer = get_kmer(seq, start)
53+
norm = k2v[kmer] if kmer in k2v else None
54+
55+
if not norm is None:
56+
if norm != 0:
57+
new_level = mlevel/norm
58+
else:
59+
new_level = 0.
60+
61+
out.write(f'{self.tchrn}\t{start}\t{start+1}\t{mlevel}\t{strand}\t{cov}\t{kmer}\t{new_level:.4f}\n')
62+
out.close()
63+
64+
def main(args):
65+
knorm_fn = args.norm
66+
fas_fn = args.fas
67+
bed_fn = args.bed
68+
chrs = args.chrs
69+
chrs = [chrn.strip() for chrn in chrs.split(',')]
70+
71+
processes = []
72+
for chrn in chrs: # launch a process for each chromosome
73+
p = Writer(knorm_fn, fas_fn, bed_fn, chrn)
74+
p.start()
75+
processes.append(p)
76+
[p.join() for p in processes]
77+
78+
def argparser():
79+
parser = argparse.ArgumentParser(
80+
'Apply normalization constants to genome',
81+
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
82+
add_help=False
83+
)
84+
parser.add_argument("--norm", required=True, type=str)
85+
parser.add_argument("--fas", required=True, type=str)
86+
parser.add_argument("--bed", required=True, type=str)
87+
parser.add_argument("--chrs", required=True, type=str)
88+
return parser

0 commit comments

Comments
 (0)