Skip to content

Commit df89369

Browse files
authored
Update README.md
Updated to reflect GUI and current implementation. Signed-off-by: Ian M. B. <99409346+iPsychonaut@users.noreply.github.com>
1 parent 8ce5852 commit df89369

File tree

1 file changed

+69
-49
lines changed

1 file changed

+69
-49
lines changed

README.md

Lines changed: 69 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,97 @@
1+
# FunDiSPipe: Fungal Diversity Survey Pipeline
12

2-
# FunDiS Pipeline
3-
4-
FunDiS Pipeline is a suite of scripts intended to streamline the processing of Next-Generation Sequencing (NGS) data. The scripts can be run individually or as a whole to form a complete pipeline.
3+
FunDiSPipe is a comprehensive bioinformatics pipeline designed for the Fungal Diversity Survey (FunDiS), specifically tailored for analyzing fungal ITS data from Oxford Nanopore Technologies sequencing. This pipeline streamlines the process from sequencing data to species identification and summarization. This is the main Graphical User Interface for a modified protocol devloped by Stephen Douglas Russell (https://www.protocols.io/view/primary-data-analysis-basecalling-demultiplexing-a-dm6gpbm88lzp/v3?step=3); this pipeline was paid for by the Fungal Diversity Survey (FunDiS).
54

65
## Prerequisites
76

87
This application is designed to be run on a Linux/WSL environment and requires the following Python libraries:
98

10-
- psutil
11-
- tqdm
12-
- pandas
13-
- pysam
14-
- biopython
15-
- multiprocessing
16-
- math
17-
- queue
18-
- glob
19-
- shutil
9+
- openblas==0.3.3
10+
- biopython==1.81
11+
- samtools==1.18
12+
- minimap2==2.26
13+
- bcftools==1.17
14+
- bwa==0.7.17
15+
- whatshap==2.1
16+
- spoa==4.1.3
17+
- racon==1.5.0
18+
- pyvcf==0.6.8
19+
- termcolor=2.3.0
20+
- gdown==4.7.1
2021

2122
The application also relies on the following tools:
2223

23-
- NGSpeciesID
24-
- bwa
25-
- samtools
26-
- bcftools
27-
- whatshap
28-
- medaka
29-
- openblas
30-
- spoa
24+
- NGSpeciesID (https://github.com/ksahlin/NGSpeciesID)
25+
- medaka (https://github.com/nanoporetech/medaka)
3126

32-
Note: The application checks for the required Python libraries and tools during execution and attempts to install any missing dependencies.
27+
Note: The application checks for the required Python libraries and tools by running fundis_setup.sh and attempts to install any missing dependencies.
3328

34-
## Running the Pipeline
29+
## Installation
3530

36-
Each module of the pipeline can be run individually or as a whole.
31+
To install FunDiSPipe, follow these steps after cloning the GitHub repository:
3732

38-
### Running the Whole Pipeline
33+
```bash
34+
sudo apt-get install dos2unix &&
35+
dos2unix ./fundis_setup.sh &&
36+
chmod +x ./fundis_setup.sh &&
37+
./fundis_setup.sh
38+
```
3939

40-
To run the whole pipeline, use the `fundis_main.py` script. For example:
40+
## Modules and Their Functionalities
4141

42-
```
43-
python /path/to/fundis_main.py -i /path/to/input.fastq -x /path/to/index.txt -t /path/to/primers.txt -p 80
44-
```
42+
1. **GUI (FunDiS_GUI.py)**:
43+
- Acts as the central interface for the pipeline.
44+
- Facilitates file selection, process initiation, and result visualization.
45+
- Integrates other modules for a seamless workflow.
4546

46-
### Running Individual Modules
47+
2. **Mini-Barcoder (FunDiS_Minibar.py)** (https://github.com/calacademy-research/minibar):
48+
- Prepares `.fastq.gz` files for species identification.
49+
- Extracts and processes sequences from raw data.
50+
- Essential for initial data preparation and quality control.
4751

48-
Each module can also be run individually. Here's what each module does:
52+
3. **NGSpeciesID (FunDiS_NGSpeciesID.py)**:
53+
- Identifies species from processed sequencing data.
54+
- Utilizes advanced algorithms for accurate species matching.
55+
- Outputs detailed reports on identified species and their characteristics.
4956

50-
- **fundis_minibar_ngsid.py**: This script processes the input FASTQ file with MiniBar and NGSpeciesID. MiniBar is a tool for demultiplexing barcoded read data and NGSpeciesID is a tool used for the identification of specimens in NGS datasets. The script starts by checking the operating system, installing missing libraries, and setting up the working environment. It then moves on to demultiplexing and identifying species from the input FASTQ data. The results are output in a directory specified by the user.
57+
4. **Haplotype Phaser (FunDiS_hap_phase.py)**:
58+
- Resolves haplotype variations in sequencing data.
59+
- Enhances species identification accuracy.
60+
- Critical for detailed genetic analysis and research.
5161

52-
```
53-
python /path/to/fundis_minibar_ngsid.py -i /path/to/input.fastq -x /path/to/index.txt -t /path/to/primers.txt -p 80
54-
```
62+
5. **MycoMap Summarizer (MycoMap_Summarize.py)** ():
63+
- Aggregates results from the entire pipeline.
64+
- Produces comprehensive summary reports for analysis and interpretation.
65+
- Simplifies data review and sharing.
5566

56-
- **fundis_haplotype_phaser.py**: This script takes the output from the `fundis_minibar_ngsid.py` script and phases the haplotypes for each sample. Phasing is the process of determining the specific set of variants found on each physical copy of a particular gene or genomic region. The phased haplotypes are output in the NGSpeciesID output directory.
67+
## Inputs and Outputs
5768

58-
```
59-
python /path/to/fundis_haplotype_phaser.py -i /path/to/input_dir -p 80
60-
```
69+
- **Input**: `.fastq.gz` file containing Oxford Nanopore Guppy Basecalled sequences.
70+
- **Outputs**:
71+
- Processed and quality-checked sequence data.
72+
- Species identification reports and detailed analysis.
73+
- Summarized outputs and aggregated data for further study.
6174

62-
- **fundis_summarize.py**: This script summarizes the output from the `fundis_haplotype_phaser.py` script. It provides a summary of the results, including counts of unique samples, total consensus sequences, and total reads in consensus sequences. It also copies and updates the names of all FASTQ and consensus FASTA files. The results are output in a summary directory named after the source directory.
75+
## Usage
6376

64-
```
65-
python /path/to/fundis_summarize.py -i /path/to/input_dir -p 80
66-
```
77+
Start by navigating the GUI to select your input files. Here’s a brief guide on using each module:
78+
79+
- **GUI**: Launch the GUI script to access the pipeline's functionalities. The interface is intuitive and guides you through the process.
80+
- **Mini-Barcoder**: After selecting your `.fastq.gz` file, this module will prepare it for the NGSpeciesID analysis.
81+
- **NGSpeciesID**: Once the data is prepped, use this module for species identification. The output will include detailed species information.
82+
- **Haplotype Phaser**: To delve deeper into the genetic analysis, use this module for haplotype phasing.
83+
- **MycoMap Summarizer**: Finally, to aggregate and summarize your results, use this module. It consolidates the data into an easy-to-interpret format.
84+
85+
For detailed instructions and options for each module, refer to the comments and documentation within each script file. These instructions provide guidance on executing the scripts and customizing the analysis to your requirements.
86+
87+
## Contributing
88+
89+
Contributions are welcome. Please follow standard coding practices and clearly document any changes or enhancements.
6790

68-
## Arguments
91+
## License
6992

70-
- `-i`, `--input_fastq` or `--input_dir`: Path to the FASTQ file containing ONT nrITS data or path to the directory containing the data.
71-
- `-t`, `--primers_text_path`: Path to Text file containing the Primers used to generate input_fastq.
72-
- `-x`, `--minbar_index_path`: Path to Text file containing the minibar index to parse input_fastq.
73-
- `-p`, `--percent_system_use`: Percent system use written as an integer.
93+
Please see the LICENSE file in the GitHub repository for detailed licensing information.
7494

7595
## Author
7696

77-
Ian Michael Bollinger (ian.michael.bollinger@gmail.com/researchconsultants@critical.consulting)
97+
ian.michael.bollinger@gmail.com/researchconsultants@critical.consulting

0 commit comments

Comments
 (0)