You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FunDiS Pipeline is a suite of scripts intended to streamline the processing of Next-Generation Sequencing (NGS) data. The scripts can be run individually or as a whole to form a complete pipeline.
4
+
5
+
## Prerequisites
6
+
7
+
This application is designed to be run on a Linux/WSL environment and requires the following Python libraries:
8
+
9
+
- psutil
10
+
- tqdm
11
+
- pandas
12
+
- pysam
13
+
- biopython
14
+
15
+
The application also relies on the following tools:
16
+
17
+
- NGSpeciesID
18
+
- bwa
19
+
- samtools
20
+
- bcftools
21
+
- whatshap
22
+
- medaka
23
+
- openblas
24
+
- spoa
25
+
26
+
Note: The application checks for the required Python libraries and tools during execution and attempts to install any missing dependencies.
27
+
28
+
## Running the Pipeline
29
+
30
+
Each module of the pipeline can be run individually or as a whole.
31
+
32
+
### Running the Whole Pipeline
33
+
34
+
To run the whole pipeline, use the `fundis_main.py` script. For example:
Each module can also be run individually. Here's what each module does:
43
+
44
+
-**fundis_minibar_ngsid.py**: This script processes the input FASTQ file with MiniBar and NGSpeciesID. MiniBar is a tool for demultiplexing barcoded read data and NGSpeciesID is a tool used for the identification of specimens in NGS datasets. The script starts by checking the operating system, installing missing libraries, and setting up the working environment. It then moves on to demultiplexing and identifying species from the input FASTQ data. The results are output in a directory specified by the user.
-**fundis_haplotype_phaser.py**: This script takes the output from the `fundis_minibar_ngsid.py` script and phases the haplotypes for each sample. Phasing is the process of determining the specific set of variants found on each physical copy of a particular gene or genomic region. The phased haplotypes are output in the NGSpeciesID output directory.
-**fundis_summarize2.py**: This script summarizes the output from the `fundis_haplotype_phaser.py` script. It provides a summary of the results, including counts of unique samples, total consensus sequences, and total reads in consensus sequences. It also copies and updates the names of all FASTQ and consensus FASTA files. The results are output in a summary directory named after the source directory.
0 commit comments