Skip to content

Commit 28dde62

Browse files
committed
Update Readme
Document where to find pipelines to produce input files.
1 parent 4cb6ed1 commit 28dde62

File tree

4 files changed

+32
-30
lines changed

4 files changed

+32
-30
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
data/*
22
tmp/*
33
envs/*
4+
logs/*
5+
.snakemake
6+
falcon-comb*

README.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,35 @@
1-
Template workflow folder for Snakemake pipeline
2-
===============================================
1+
Pogigwasc gene prediction of Loxodes magnus genome
2+
===================================================
33

4-
After cloning this repository, you should change the name of the folder as
5-
appropriate, and update the remote URL of the repository to a new one for your
6-
project.
4+
Snakemake pipeline for gene prediction for Loxodes magnus, which has a genetic
5+
code with context-dependent stop codons. Introns are first empirically
6+
predicted with [Intronarrator](https://github.com/Swart-lab/Intronarrator) and
7+
artifically removed to produce an "intronless" assembly, to run
8+
[Pogigwasc](https://github.com/Swart-lab/pogigwasc) in `--no-intron` mode. This
9+
is because the short lengths and unusual length distribution of introns in
10+
Loxodes are difficult to model with the GHMM in Pogigwasc.
711

12+
Data
13+
----
814

9-
Suggested setup
10-
---------------
15+
Pipeline and scripts to generate the genome assembly are available from
16+
[loxodes-assembly-workflow](https://github.com/Swart-lab/loxodes-assembly-workflow)
17+
repository. Pipeline for the "intronless" assembly is available from
18+
[loxodes-intronarrator-workflow](https://github.com/Swart-lab/loxodes-intronarrator-workflow).
1119

12-
```bash
13-
git clone git@github.com:Swart-lab/snakemake-template.git
14-
mv snakemake-template my-project # rename project folder
15-
cd my-project
16-
mkdir data # folder to put project data, gitignored
17-
mkdir envs # folder for Conda envs produced by workflow, gitignored
18-
mkdir tmp # folder for temp files, gitignored
19-
mkdir nb # folder for computational notebooks etc.
20-
git remote remove origin # remove template repo as a remote
21-
```
20+
This current pipeline was used for annotation of the MAC and MIC genomes; path
21+
to reference assembly and names of output files were modified accordingly.
2222

23-
Edit the files `run_snakemake.sh` and/or `run_snakemake_sge.sh` to include
24-
absolute paths to the working folder and to a Conda environment with Snakemake,
25-
and modify other settings (e.g. max number of CPUs) as required.
23+
Paths to input files in the `workflow/config.yaml` file are local paths used in
24+
the original data analysis. When re-running the pipeline, replace these with
25+
the actual paths on your system.
2626

27-
Snakemake rules and config files are in the `workflow/` subfolder.
27+
Curated output from this annotation are included in the [archive of genome
28+
annotations](https://doi.org/10.17617/3.9QTROS).
2829

2930

3031
Running workflow
3132
----------------
3233

3334
To run on a local server, use `./run_snakemake.sh` script, and add rule names
3435
and additional parameters, e.g. `./run_snakemake.sh --dryrun`.
35-
36-
[Documentation for `run_snakemake_sge.sh` TK]

run_snakemake.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ set -e
1111
# * Conda environments will be created in a subfolder `envs/`
1212

1313
# PATHS
14-
SNAKEMAKE_ENV=
15-
WD=
14+
SNAKEMAKE_ENV=/ebio/ag-swart/home/kbseah/anaconda3/envs/snakemake
15+
WD=/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/pogigwasc_intronless
1616

1717
# activate snakemake conda environment
1818
source activate $SNAKEMAKE_ENV
1919

2020
snakemake \
21-
--cores 24 \
21+
--cores 16 \
2222
--configfile $WD/workflow/config.yaml \
2323
--use-conda \
2424
--conda-frontend mamba \

workflow/config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
falcon-comb_LmagMIC:
2-
ref_orig:
2+
ref_orig: # Original reference genome assembly
33
/ebio/abt2_projects/ag-swart-loxodes/assembly/falcon-comb_LmagMIC/scaffolds.fasta
4-
ref_intronless_masked:
4+
ref_intronless_masked: # Intronless assembly produced by Intronarrator
55
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/falcon-comb_LmagMIC.0.2.minus_introns.ncRNA_hard_masked.fa
6-
realtrons_gff:
6+
realtrons_gff: # Intron annotation GFF3 file produced by intronarrator
77
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/all.realtrons.0.2.noalt.gff
8-
trf_min1000:
8+
trf_min1000: # Low-complexity sequence annotation GFF3
99
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/trf/falcon-comb_LmagMIC.trf.no_overlap.min1000.merge.bed

0 commit comments

Comments
 (0)