|
1 |
| -Template workflow folder for Snakemake pipeline |
2 |
| -=============================================== |
| 1 | +Pogigwasc gene prediction of Loxodes magnus genome |
| 2 | +=================================================== |
3 | 3 |
|
4 |
| -After cloning this repository, you should change the name of the folder as |
5 |
| -appropriate, and update the remote URL of the repository to a new one for your |
6 |
| -project. |
| 4 | +Snakemake pipeline for gene prediction for Loxodes magnus, which has a genetic |
| 5 | +code with context-dependent stop codons. Introns are first empirically |
| 6 | +predicted with [Intronarrator](https://github.com/Swart-lab/Intronarrator) and |
| 7 | +artifically removed to produce an "intronless" assembly, to run |
| 8 | +[Pogigwasc](https://github.com/Swart-lab/pogigwasc) in `--no-intron` mode. This |
| 9 | +is because the short lengths and unusual length distribution of introns in |
| 10 | +Loxodes are difficult to model with the GHMM in Pogigwasc. |
7 | 11 |
|
| 12 | +Data |
| 13 | +---- |
8 | 14 |
|
9 |
| -Suggested setup |
10 |
| ---------------- |
| 15 | +Pipeline and scripts to generate the genome assembly are available from |
| 16 | +[loxodes-assembly-workflow](https://github.com/Swart-lab/loxodes-assembly-workflow) |
| 17 | +repository. Pipeline for the "intronless" assembly is available from |
| 18 | +[loxodes-intronarrator-workflow](https://github.com/Swart-lab/loxodes-intronarrator-workflow). |
11 | 19 |
|
12 |
| -```bash |
13 |
| -git clone git@github.com:Swart-lab/snakemake-template.git |
14 |
| -mv snakemake-template my-project # rename project folder |
15 |
| -cd my-project |
16 |
| -mkdir data # folder to put project data, gitignored |
17 |
| -mkdir envs # folder for Conda envs produced by workflow, gitignored |
18 |
| -mkdir tmp # folder for temp files, gitignored |
19 |
| -mkdir nb # folder for computational notebooks etc. |
20 |
| -git remote remove origin # remove template repo as a remote |
21 |
| -``` |
| 20 | +This current pipeline was used for annotation of the MAC and MIC genomes; path |
| 21 | +to reference assembly and names of output files were modified accordingly. |
22 | 22 |
|
23 |
| -Edit the files `run_snakemake.sh` and/or `run_snakemake_sge.sh` to include |
24 |
| -absolute paths to the working folder and to a Conda environment with Snakemake, |
25 |
| -and modify other settings (e.g. max number of CPUs) as required. |
| 23 | +Paths to input files in the `workflow/config.yaml` file are local paths used in |
| 24 | +the original data analysis. When re-running the pipeline, replace these with |
| 25 | +the actual paths on your system. |
26 | 26 |
|
27 |
| -Snakemake rules and config files are in the `workflow/` subfolder. |
| 27 | +Curated output from this annotation are included in the [archive of genome |
| 28 | +annotations](https://doi.org/10.17617/3.9QTROS). |
28 | 29 |
|
29 | 30 |
|
30 | 31 | Running workflow
|
31 | 32 | ----------------
|
32 | 33 |
|
33 | 34 | To run on a local server, use `./run_snakemake.sh` script, and add rule names
|
34 | 35 | and additional parameters, e.g. `./run_snakemake.sh --dryrun`.
|
35 |
| - |
36 |
| -[Documentation for `run_snakemake_sge.sh` TK] |
0 commit comments