Skip to content

A Nextflow pipeline to perform quality control, alignment, and feature coverage of CUT&Tag sequencing data.

Notifications You must be signed in to change notification settings

vonMeyennLab/nf_cutntag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUT&Tag Sequencing Pipeline

A Nextflow pipeline to perform quality control, alignment, and feature coverage of CUT&Tag sequencing data.

The pipeline was created to run on the ETH Euler cluster and it relies on the server's genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.

Pipeline steps

  1. FastQC
  2. FastQ Screen
  3. Trim Galore
  4. FastQC
  5. Bowtie2
  6. Samtools sort
  7. picard MarkDuplicates
  8. Samtools index
  9. bedtools genomecov
  10. MultiQC

Required parameters

Path to the folder where the FASTQ files are located.

--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz

Output directory where the files will be saved.

--outdir /cluster/work/nme/data/josousa/project

Input optional parameters

  • Option to force the pipeline to assign input as single-end.

    --single_end

    By default, the pipeline detects whether the input files are single-end or paired-end.

Genomes

  • Reference genome used for alignment.

    --genome

    Available genomes:

        Mus_musculus_GRCm39 # Default
        Mus_musculus_GRCm38_p6
        Homo_sapiens_GRCh38_p14
        Rattus_norvegicus_mRatBN7_2
        Bos_taurus_ARS-UCD1_2
        Bos_taurus_ARS-UCD1_3
        Caenorhabditis_elegans_WBcel235
        Callithrix_jacchus_mCalJac1_pat_X
        Capra_hircus_ARS1
        Capreolus_capreolus_GCA_951849835_1
        Escherichia_coli_ASM160652v1
        Macaca_fascicularis_Macaca_fascicularis_6_0
        Macaca_mulatta_Mmul_10
        Monodelphis_domestica_ASM229v1
        Pan_troglodytes_Pan_tro_3_0
        Saccharomyces_cerevisiae_R64-1-1
        Sus_scrofa_Sscrofa11_1
  • Option to use a custom genome for alignment by providing an absolute path to a custom genome file.

    --custom_genome_file '/cluster/work/nme/data/josousa/project/genome/GRCm39.genome'

    Example of a genome file:

    name           GRCm39
    species        Mouse
    bowtie2        /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/Bowtie2Index/genome

FastQ Screen optional parameters

  • Option to provide a custom FastQ Screen config file.
    --fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default

Bowtie2 optional parameters

  • Option to suppress SAM records for unaligned reads.

    --bowtie2_no_unal Default: true

  • By default, Bowtie2 has the following parameters adapted for CutnTag sequencing:

    --bowtie2_args="--local --very-sensitive-local --minins 10 --maxins 700"

picard MarkDuplicates optional parameters

  • Option to not write duplicates to the output file instead of writing them with appropriate flags set.

    --picard_markduplicates_remove_duplicates Default: false

  • Option to remove 'optical' duplicates and other duplicates that appear to have arisen from the sequencing process instead of the library preparation process, even if REMOVE_DUPLICATES is false. If REMOVE_DUPLICATES is true, all duplicates are removed and this option is ignored.

    --picard_markduplicates_remove_sequencing_duplicates Default: false

bedtools genomecov optional parameters

  • Option to report depth in BedGraph format, as the option '-bg'. However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: 'grep -w 0$' to the output.

    --bedtools_genomecov_bga Default: true

Skipping options

  • Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment.

    --skip_qc

  • Option to skip FastQ Screen.

    --skip_fastq_screen

Extra arguments

  • Option to add extra arguments to FastQC. --fastqc_args

  • Option to add extra arguments to FastQ Screen. --fastq_screen_args

  • Option to add extra arguments to Trim Galore. --trim_galore_args

  • Option to add extra arguments to the Bowtie2 aligner. --bowtie2_args

  • Option to add extra arguments to Samtools sort. --samtools_sort_args

  • Option to add extra arguments to picard MarkDuplicates. --mark_duplicates_args

  • Option to add extra arguments to Samtools index. --samtools_index_args

  • Option to add extra arguments to bedtools genomecov. --bedtools_genomecov_args

  • Option to add extra arguments to MultiQC. --multiqc_args

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.

About

A Nextflow pipeline to perform quality control, alignment, and feature coverage of CUT&Tag sequencing data.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published