quantms, ms2rescore and multiple search engines enables deep proteome coverage across protein quantification, immunopeptidomics, and phosphoproteomics experiments
This repository contains the manuscript and supporting materials for the research paper "quantms, ms2rescore and multiple search engines enables deep proteome coverage across protein quantification, immunopeptidomics, and phosphoproteomics experiments" by:
- Dai Chengxin¹'² - State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China; International Academy of Phronesis Medicine (Guangdong), Guangdong, China
- Ralf Gabriels³ - VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Robbin Bouwmeester³ - VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Jonas Scheid⁴'⁵'⁶'⁷ - Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany; Cluster of Excellence iFIT (EXC2180) "Image-Guided and Functionally Instructed Tumor Therapies", University of Tübingen, Germany; Quantitative Biology Center (QBiC), University of Tübingen, Germany; Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Germany
- Lennart Martens³ - VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Oliver Kohlbacher⁹ - Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Mingze Bai⁸ - Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- Timo Sachsenberg⁹ - Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Yasset Perez-Riverol¹⁰ - European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
The exponential growth of public proteomics datasets has outpaced the capacity of traditional desktop tools for large-scale automated analysis. This study presents an integrated workflow combining quantms, a cloud-native pipeline, with MS2Rescore, a machine learning-driven rescoring tool, to enable deep reanalysis of massive proteomic datasets. Leveraging the Nextflow workflow engine for parallel computing, the pipeline integrates fragment ion intensity and retention time predictions from MS2PIP and DeepLC to optimize peptide-spectrum match reliability via Percolator.
- 16-22.8% increase in identified spectra compared to traditional approaches
- Hundreds of newly quantified proteins and phosphosites across diverse experimental designs
- Demonstrated improvements across:
- Label-free quantification (LFQ)
- TMT labeling
- Immunopeptidomics
- Phosphoproteomics
.
├── ms2rescore-quantms-mcp/
│ ├── main.tex # Main manuscript LaTeX file
│ ├── main.pdf # Compiled manuscript PDF
│ ├── references.bib # Bibliography file
│ ├── mcp.bib # Additional bibliography
│ └── figures/ # All manuscript figures
│ ├── PXD019643.png # Immunopeptides results
│ ├── PXD001819.jpg # LFQ benchmark results
│ ├── CPTAC_TMT.png # TMT experiment results
│ ├── phospho2.png # Phosphoproteomics results
│ ├── *_weights.png # Feature weight analyses
│ └── PXD001819/ # Additional LFQ figures
├── README.md # This file
└── LICENSE # License information
- PXD001819: Sigma UPS1 proteins spiked into yeast lysate (LFQ benchmark)
- PXD019643: HLA Class I immunopeptides
- PXD026824: Phosphoproteomics dataset
- PDC000125: CPTAC TMT dataset
The study evaluated five different workflow settings:
- Comet alone
- Comet + MSGF+
- Comet + MS2Rescore
- Comet + MSGF+ + MS2Rescore
- Comet + MSGF+ + SAGE + MS2Rescore
- quantms: Cloud-based pipeline for quantitative proteomics
- MS2Rescore: Machine learning-driven rescoring tool
- MS2PIP: Fragment ion intensity prediction
- DeepLC: Retention time prediction
- Percolator: Statistical validation and FDR control
- Nextflow: Workflow management system
- quantms pipeline
- MS2Rescore
- Nextflow
- OpenMS toolkit
- Percolator
All datasets used in this study are publicly available:
- PRIDE Archive: PXD001819, PXD019643, PXD026824
- CPTAC: PDC000125
This work follows FAIR (Findability, Accessibility, Interoperability, and Reusability) principles:
- All analyses use open-source tools
- Standard file formats employed throughout
- Reproducible execution environments
- Publicly available datasets
- Open workflow management with Nextflow
- 17% improvement with multiple search engines
- Additional 16% improvement with MS2Rescore features
- Enhanced quantification of low-abundance UPS1 proteins
- 3.6% improvement in PSM identification
- 921 newly quantified proteins
- 27 newly quantified differentially expressed proteins
- 11.7% improvement with multiple search engines
- Additional 22.8% improvement with MS2Rescore
- Clear separation of target and decoy PSMs
- 19% improvement in spectra identification
- 17% increase in phosphorylated peptides at 0.01 FLR
- 350 newly identified protein phosphorylation sites
If you use this work, please cite:
@article{dai_quantms_ms2rescore_2024,
title = {quantms, ms2rescore and multiple search engines enables deep proteome coverage across protein quantification, immunopeptidomics, and phosphoproteomics experiments},
author = {Dai, Chengxin and Gabriels, Ralf and Bouwmeester, Robbin and Scheid, Jonas and Martens, Lennart and Kohlbacher, Oliver and Bai, Mingze and Sachsenberg, Timo and Perez-Riverol, Yasset},
journal = {[Journal Name]},
year = {2024},
note = {In preparation}
}
To generate the PDF from the LaTeX source:
Ensure you have a LaTeX distribution installed:
- Linux/macOS: Install TeX Live
- Windows: Install MiKTeX or TeX Live
- Online: Use Overleaf (https://overleaf.com)
The manuscript requires the following LaTeX packages:
amsmath
,amssymb
- Mathematical symbolsgraphicx
- Figure inclusiongeometry
- Page layoutsetspace
- Line spacing controlauthblk
- Author and affiliation formattingcite
- Citation managementcaption
- Figure caption formattingsubfigure
- Subfiguresnatbib
- Bibliography management
-
Navigate to the manuscript directory:
cd ms2rescore-quantms-mcp/
-
Compile the LaTeX document:
# First compilation pdflatex main.tex # Generate bibliography bibtex main # Second compilation (resolves citations) pdflatex main.tex # Third compilation (resolves cross-references) pdflatex main.tex
-
Alternative single command:
latexmk -pdf main.tex
After successful compilation, you'll find:
main.pdf
- The final manuscriptmain.aux
- Auxiliary file with referencesmain.bbl
- Bibliography filemain.blg
- Bibliography logmain.log
- Compilation logmain.synctex.gz
- SyncTeX file for editor integration
- Missing packages: Install required LaTeX packages using your distribution's package manager
- Figure errors: Ensure all figures in the
figures/
directory are accessible - Bibliography errors: Check that
references.bib
is properly formatted - Font issues: Some systems may require additional font packages
For convenience, you can also compile the document online:
- Upload the entire
ms2rescore-quantms-mcp/
folder to Overleaf - Set
main.tex
as the main document - Compile using the platform's interface
This repository contains research materials. For questions or comments about the methodology or results, please contact the authors.
Thank those who supported the research, including funding bodies and the proteomics community for making public datasets available.