Skip to content

Commit 4aa3e0b

Browse files
committed
update documentation
1 parent 7287479 commit 4aa3e0b

File tree

3 files changed

+74
-22
lines changed

3 files changed

+74
-22
lines changed

README.md

Lines changed: 72 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -74,43 +74,63 @@ singularity exec peakscout_latest.sif peakScout --help
7474
## Usage
7575

7676
### Decomposing Reference GTF
77+
The first step of peakScout is to create the decomposed reference.
78+
79+
| Parameter | Type | Description |
80+
|------------|-------|-----------------------------------------------|
81+
| `ref_dir` | `str` | The directory to store the GTF decompositions.|
82+
| `gtf_ref` | `str` | The path to the GTF file. |
83+
7784
To decompose a reference GTF file so that it can be used by peakScout, run the following command
7885
```bash
79-
peakScout decompose --ref_dir /path/to/where/outputs/stored --species species of gtf --gtf_ref /path/to/gtf/file
86+
peakScout decompose \
87+
--ref_dir /path/to/where/outputs/stored \
88+
--gtf_ref /path/to/gtf/file
8089
```
8190

8291
Specific example:
8392

8493
```bash
8594
peakScout decompose \
86-
--ref_dir reference/ \
87-
--species mm39 \
95+
--ref_dir reference/mm39/ \
8896
--gtf_ref reference/gencode.vM37.basic.annotation.gtf
8997
```
9098

91-
A directory called `reference/mm39` will be created and you will use the `mm39` as the species name in other peakScout operations.
99+
A directory called `reference/mm39` will be created and should be used as the `ref_dir` argument for downstream peakScout operations.
92100

93101
### Finding Nearest Genes
94102

95103
Once a reference GTF has been decomposed, you can use the decomposition to find the nearest genes to your peaks. Peak files can be MACS2, SEACR outputs, or standard BED6 format files and can be Excel sheets or BED files.
96104

97-
Run the following command to create an Excel sheet containing the nearest k genes to your peaks
98-
```bash
99-
peakScout peak2gene --peak_file /path/to/peak/file --peak_type MACS2/SEACR/BED6 --species species of gtf --k number of nearest genes --ref_dir /path/to/reference/directory --output_name name of output file --o /path/to/save/output --output_type csv/xslx
100-
```
101-
102-
Specific example:
105+
| Parameter | Type | Description |
106+
|-----------------|---------|--------------------------------------------------------------------------------------|
107+
| `peak_file` | `str` | Path to the peak file. |
108+
| `peak_type` | `str` | Type of peak caller used to generate peak file (e.g. MACS2, SEACR, BED6). |
109+
| `num_features` | `int` | Number of nearest features to find. |
110+
| `ref_dir` | `str` | Directory containing decomposed reference data. |
111+
| `output_name` | `str` | Name for output file. |
112+
| `out_dir` | `str` | Directory to output file. |
113+
| `output_type` | `str` | Output type (csv file or xlsx file). |
114+
| `species_genome`| `str` | Species of the reference genome. |
115+
| `option` | `str` | Option for defining start and end positions of peaks. Default native_peak_bounaries. |
116+
| `boundary` | `int` | Boundary for artificial peak boundary option. `None` if other options. |
117+
| `up_bound` | `int` | Maximum allowed distance between peak and upstream feature. Default `None`. |
118+
| `down_bound` | `int` | Maximum allowed distance between peak and downstream feature. Default `None`. |
119+
| `consensus` | `bool` | Whether to use consensus peaks. Default `False`. |
120+
| `drop_columns` | `bool` | Whether to drop unnecessary columns from the original file. Default `False`. |
121+
| `view_window` | `float` | Proportion of the peak region in entire genome browser window. Default `0.2`. |
103122

123+
Run the following command to create an Excel sheet containing the nearest k genes to your peaks
104124
```bash
105125
peakScout peak2gene \
106-
--peak_file test/test_MACS2.bed \
107-
--peak_type MACS2 \
108-
--species mm39 \
109-
--k 2 \
110-
--ref_dir reference/mm39 \
111-
--output_name peakScout_test_MACS2 \
112-
--o my_output_dir \
113-
--output_type xslx
126+
--peak_file /path/to/peak/file \
127+
--peak_type MACS2/SEACR/BED6 \
128+
--species_genome UCSC-defined species of gtf \
129+
--k number of nearest genes \
130+
--ref_dir /path/to/reference/directory \
131+
--output_name name of output file \
132+
--o /path/to/save/output \
133+
--output_type csv/xslx
114134
```
115135

116136
Specific example:
@@ -119,7 +139,7 @@ Specific example:
119139
peakScout peak2gene \
120140
--peak_file test/test_MACS2.bed \
121141
--peak_type MACS2 \
122-
--species mm39 \
142+
--species_genome mm39 \
123143
--k 2 \
124144
--ref_dir reference/mm39 \
125145
--output_name peakScout_test_MACS2 \
@@ -129,16 +149,46 @@ peakScout peak2gene \
129149

130150
### Finding Nearest Peaks
131151

132-
Once a reference GTF has been decomposed, you can use the decomposition to find the nearest peaks to a set of genes. Peak files can be MACS2, SEACR outputs, or standard BED6 format files and can be Excel sheets or BED files. Gene names should be in a single column CSV file with no header.
152+
Once a reference GTF has been decomposed, you can also use the decomposition to find the nearest peaks to a set of genes. Peak files can be MACS2, SEACR outputs, or standard BED6 format files and can be Excel sheets or BED files. Gene names should be in a single column CSV or txt file with no header.
153+
154+
| Parameter | Type | Description |
155+
|----------------|--------|---------------------------------------------------------------------------------------|
156+
| `peak_file` | `str` | Path to the peak file. |
157+
| `peak_type` | `str` | Type of peak caller used to generate peak file (e.g. MACS2, SEACR, BED6). |
158+
| `gene_file` | `str` | Path to the gene file. |
159+
| `num_features` | `int` | Number of nearest features to find. |
160+
| `ref_dir` | `str` | Directory containing decomposed reference data. |
161+
| `output_name` | `str` | Name for output file. |
162+
| `out_dir` | `str` | Directory to output file. |
163+
| `output_type` | `str` | Output type (csv file or xlsx file). |
164+
| `option` | `str` | Option for defining start and end positions of peaks. Default native_peak_boundaries. |
165+
| `boundary` | `int` | Boundary for artificial peak boundary option. `None`å if other options. |
166+
| `consensus` | `bool` | Whether to use consensus peaks. Default `False`. |
133167

134168
Run the following command to create an Excel sheet containing the nearest k peaks to your genes
135169
```bash
136-
peakScout gene2peak --peak_file /path/to/peak/file --peak_type MACS2/SEACR/BED6 --gene_file /path/to/gene/file --species species of gtf --k number of nearest peaks --ref_dir /path/to/reference/directory --output_name name of output file --o /path/to/save/output --output_type csv/xslx
170+
peakScout gene2peak \
171+
--peak_file /path/to/peak/file \
172+
--peak_type MACS2/SEACR/BED6 \
173+
--gene_file /path/to/gene/file \
174+
--k number of nearest peaks \
175+
--ref_dir /path/to/reference/directory \
176+
--output_name name of output file \
177+
--o /path/to/save/output \
178+
--output_type csv/xslx
137179
```
138180

139181
Specific example:
140182
```bash
141-
peakScout gene2peak --peak_file /path/to/peak/file --peak_type MACS2/SEACR/BED6 --gene_file /path/to/gene/file --species species of gtf --k number of nearest peaks --ref_dir /path/to/reference/directory --output_name name of output file --o /path/to/save/output --output_type csv/xslx
183+
peakScout gene2peak \
184+
--peak_file test/test_MACS2.bed \
185+
--peak_type MACS2 \
186+
--gene_file test/test_genes.txt \
187+
--k 3 \
188+
--ref_dir reference/mm39 \
189+
--output_name test_gene2peak_MACS2 \
190+
--o my_output_dir \
191+
--output_type csv
142192
```
143193

144194
## peakScout ready-made references for common organisms

src/peak2gene.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ def peak2gene(
5555
down_bound (int): Maximum allowed distance between peak and downstream feature.
5656
consensus (bool): Whether to use consensus peaks. Default False.
5757
drop_columns (bool): Whether to drop unnecessary columns from the original file. Default False.
58+
view_window (float): Proportion of the peak region in entire genome browser window. Default 0.2.
5859
5960
Returns:
6061
None

src/process_features.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,7 @@ def get_ucsc_browser_urls(
370370
Parameters:
371371
species_genome (str): Species of the reference genome.
372372
df (pl.DataFrame): Polars DataFrame containing peak information.
373+
view_window (float): Proportion of the peak region in entire genome browser window.
373374
374375
Returns:
375376
urls (list): List of UCSC Genome Browser URLs for each peak.

0 commit comments

Comments
 (0)