Skip to content

Commit c469bb1

Browse files
authored
Fix dataset description itemizations (#471)
1 parent 43c9575 commit c469bb1

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

docs/faq.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,11 +108,13 @@ These PSMs were obtained by collecting up to the top 100 PSMs for each of the pr
108108
The data are available as annotated MGF files on Zenodo:
109109
- [MassIVE-KB v1 30 million PSMs](https://doi.org/10.5281/zenodo.14973856)
110110
- [MassIVE-KB v1 30 million PSMs training/validation/test splits](https://doi.org/10.5281/zenodo.14967861)
111-
The dataset was originally compiled through the following steps:
112-
- On the [MassIVE website](https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp), go to [MassIVE Knowledge Base](https://massive.ucsd.edu/ProteoSAFe/static/massive-kb-libraries.jsp) > [Human HCD Spectral Library](https://massive.ucsd.edu/ProteoSAFe/status.jsp?task=82c0124b6053407fa41ba98f53fd8d89) > [All Candidate library spectra](https://massive.ucsd.edu/ProteoSAFe/result.jsp?task=82c0124b6053407fa41ba98f53fd8d89&view=candidate_library_spectra) > Download.
111+
112+
The dataset was originally compiled through the following steps:
113+
114+
- On the [MassIVE website](https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp), go to [MassIVE Knowledge Base](https://massive.ucsd.edu/ProteoSAFe/static/massive-kb-libraries.jsp) > [Human HCD Spectral Library](https://massive.ucsd.edu/ProteoSAFe/status.jsp?task=82c0124b6053407fa41ba98f53fd8d89) > [All Candidate library spectra](https://massive.ucsd.edu/ProteoSAFe/result.jsp?task=82c0124b6053407fa41ba98f53fd8d89&view=candidate_library_spectra) > Download.
113115
- This will give you a zipped TSV file with the metadata and peptide identifications for all 30 million PSMs.
114116
- Using the filename (column "filename") you can then retrieve the corresponding peak files from the MassIVE FTP server (done using a wget script) and extract the desired spectra using their scan number (column "scan").
115-
2. The _non-enzymatic dataset_, used to train a non-tryptic version of Casanovo, was created by selecting PSMs with a uniform distribution of amino acids at the C-terminal peptide positions from two datasets: MassIVE-KB and PROSPECT.
117+
3. The _non-enzymatic dataset_, used to train a non-tryptic version of Casanovo, was created by selecting PSMs with a uniform distribution of amino acids at the C-terminal peptide positions from two datasets: MassIVE-KB and PROSPECT.
116118
Training, validation, and test splits for the non-enzymatic dataset are available as annotated MGF files on MassIVE with [dataset identifier MSV000094014](https://doi.org/doi:10.25345/C5KS6JG0W).
117119

118120
**Casanovo v4.2**

0 commit comments

Comments
 (0)