Skip to content
Alex Song edited this page Aug 13, 2025 · 3 revisions

Welcome to MetaWorks 2.0+ Wiki

GitHub stars GitHub forks GitHub issues

Enhanced MetaWorks Pipeline: An improved and extended version of the original MetaWorks multi-marker metabarcode processing pipeline

πŸš€ What's New in This Fork

This enhanced version of MetaWorks builds upon the excellent foundation established by Porter & Hajibabaei (2022), adding significant improvements:

  • 🐳 Docker Containerization: Complete containerized environment eliminates dependency conflicts and ensures reproducible results across systems
  • ⚑ Parallel RDP Classifier: Multi-threaded taxonomic assignment dramatically reduces processing time for large datasets
  • 🧩 Modular Architecture: Rules split into logical modules (preprocessing, denoising, pseudogene filtering) for easier maintenance and customization
  • πŸ”§ Flexible Sample Input: Support for both CSV-based sample sheets and folder-based sample discovery
  • 🐍 Enhanced Python Scripts: Modernized Python 3 scripts with improved error handling and performance
  • πŸ“Š Advanced Statistics: Comprehensive quality metrics and processing statistics at each pipeline step

πŸ“‹ Quick Navigation

Section Description
Installation Guide Step-by-step installation instructions
Quick Start Get running in 10 minutes
Supported Markers Complete list of supported metabarcoding markers
Workflows Detailed workflow documentation
Configuration Pipeline configuration options
Troubleshooting Common issues and solutions
API Reference Command-line interface documentation
Examples Real-world usage examples
Contributing How to contribute to this project

🧬 Supported Metabarcoding Markers

Marker Target Taxa Classifier Available
COI Animals, Eukaryotes βœ…
16S rRNA Bacteria, Archaea βœ…
ITS Fungi βœ…
rbcL Plants, Diatoms βœ…
12S Fish, Vertebrates βœ…
18S Eukaryotes βœ…
28S Fungi βœ…

⚑ Key Features

Multi-Marker Processing

Process multiple metabarcoding markers in a single, harmonized workflow using consistent bioinformatic approaches across all supported markers.

Exact Sequence Variants (ESVs)

Generate high-resolution ESVs or traditional OTUs with taxonomic assignments and confidence scores using the RDP Classifier.

Scalable Architecture

Built with Snakemake for reproducible, scalable processing on everything from laptops to high-performance computing clusters.

Specialized Processing

  • ITS markers: Automatic removal of flanking conserved rRNA regions
  • Protein-coding markers: Pseudogene filtering using profile HMMs
  • Quality control: Comprehensive read quality assessment and filtering

🎯 Use Cases

Research Applications

  • Biodiversity assessments
  • Environmental monitoring
  • Ecological studies
  • Taxonomic surveys

Operational Applications

  • Biomonitoring programs
  • Environmental impact assessments
  • Water quality monitoring
  • Conservation projects

πŸ“„ Original Citation

If you use this enhanced pipeline, please cite the original MetaWorks paper:

Porter, T. M., & Hajibabaei, M. (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLOS ONE, 17(9), e0274260. [doi:10.1371/journal.pone.0274260](https://doi.org/10.1371/journal.pone.0274260)

Additional Citations:

  • COI classifier: Porter, T. M., & Hajibabaei, M. (2018). Scientific Reports, 8, 4226.
  • Pseudogene filtering: Porter, T.M., & Hajibabaei, M. (2021). BMC Bioinformatics, 22: 256.
  • RDP classifier: Wang, Q., et al. (2007). Applied and Environmental Microbiology, 73(16), 5261–5267.

🀝 Community & Support

πŸ”— Related Resources


Getting Started: Ready to dive in? Head to our Installation Guide or Quick Start to begin processing your metabarcoding data.