Skip to content

Commit 1fcec6a

Browse files
bittremieuxmelihyilmazgithub-actions[bot]wsnoblemobiusklein
authored
Prepare release v4.2.0 (#331)
* Remove `train_from_scratch` config option (#275) Instead of having to specify `train_from_scratch` in the config file, training will proceed from an existing model weights file if this is given as an argument to `casanovo train`. Fixes #263. * Stabilize torch.topk() behavior (#290) * Add epsilon to index zero * Fix typo * Use base PyTorch for repeating along the vocabulary size * Combine masking steps * Lint with updated black version * Lint test files * Add topk unit test * Fix lint * Add fixme comment for future * Update changelog * Generate new screengrabs with rich-codex --------- Co-authored-by: Wout Bittremieux <wout@bittremieux.be> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Rename max_iters to cosine_schedule_period_iters (#300) * Rename max_iters to cosine_schedule_period_iters * Add deprecated config option unit test * Fix missed rename * Proper linting * Remove unnecessary logging * Test that checkpoints with deprecated config options can be loaded * Minor change * Add test for fine-tuning with deprecated config options * Remove deprecated hyperparameters during model loading * Include deprecated hyperparameter warning * Test whether the warning is issued * Verify that the deprecated option is removed * Fix comments * Avoid defining deprecated options twice * Remap previous renamed config option `every_n_train_steps` * Update changelog --------- Co-authored-by: melihyilmaz <yilmazmelih97@gmail.com> * Add FAQ entry about antibody sequencing * Don't crash when multiple beams have identical peptide scores (#306) * Test different beams with identical scores * Randomly break ties for beams with identical peptide score * Update changelog * Don't remove unit test * Allow csv to handle all newlines (#316) * Add 9-species model weights link to FAQ (#303) * Add model weights link * Generate new screengrabs with rich-codex * Clarify that these weights should only be used for benchmarking --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Wout Bittremieux <wout@bittremieux.be> * Add FAQ entry about antibody sequencing (#304) * Add FAQ entry about antibody sequencing * Generate new screengrabs with rich-codex --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com> * Allow csv to handle all newlines The `csv` module tries to handle newlines itself. On Windows, this leads to line endings of `\r\r\n` instead of `\r\n`. Setting `newline=''` produces the intended output on both platforms. * Update CHANGELOG.md * Fix linting issue * Delete docs/images/help.svg --------- Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Wout Bittremieux <wout@bittremieux.be> Co-authored-by: William Stafford Noble <wnoble@uw.edu> Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com> * Don't test on macOS versions with MPS (#327) * Prepare for release v4.2.0 * Update CHANGELOG.md (#332) --------- Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: melihyilmaz <yilmazmelih97@gmail.com> Co-authored-by: wsnoble <wnoble@uw.edu> Co-authored-by: Joshua Klein <mobiusklein@gmail.com>
1 parent 6dc301c commit 1fcec6a

File tree

11 files changed

+231
-97
lines changed

11 files changed

+231
-97
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
runs-on: ${{ matrix.os }}
1919
strategy:
2020
matrix:
21-
os: [ubuntu-latest, windows-latest, macos-latest]
21+
os: [ubuntu-latest, windows-latest, macos-13]
2222

2323
steps:
2424
- uses: actions/checkout@v4

CHANGELOG.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,21 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66

77
## [Unreleased]
88

9+
## [4.2.0] - 2024-05-14
10+
11+
### Added
12+
13+
- A deprecation warning will be issued when deprecated config options are used in the config file or in the model weights file.
14+
15+
### Changed
16+
17+
- Config option `max_iters` has been renamed to `cosine_schedule_period_iters` to better reflect that it controls the number of iterations for the cosine half period of the learning rate.
18+
19+
### Fixed
20+
21+
- Fix beam search caching failure when multiple beams have an equal predicted peptide score by breaking ties randomly.
22+
- The mzTab output file now has proper line endings regardless of platform, fixing the extra `\r` found when run on Windows.
23+
924
## [4.1.0] - 2024-02-16
1025

1126
### Changed
@@ -233,7 +248,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
233248

234249
- Initial Casanovo version.
235250

236-
[Unreleased]: https://github.com/Noble-Lab/casanovo/compare/v4.1.0...HEAD
251+
[Unreleased]: https://github.com/Noble-Lab/casanovo/compare/v4.2.0...HEAD
252+
[4.2.0]: https://github.com/Noble-Lab/casanovo/compare/v4.1.0...v4.2.0
237253
[4.1.0]: https://github.com/Noble-Lab/casanovo/compare/v4.0.1...v4.1.0
238254
[4.0.1]: https://github.com/Noble-Lab/casanovo/compare/v4.0.0...v4.0.1
239255
[4.0.0]: https://github.com/Noble-Lab/casanovo/compare/v3.5.0...v4.0.0

casanovo/config.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import logging
44
import shutil
5+
import warnings
56
from pathlib import Path
67
from typing import Optional, Dict, Callable, Tuple, Union
78

@@ -12,6 +13,14 @@
1213
logger = logging.getLogger("casanovo")
1314

1415

16+
# FIXME: This contains deprecated config options to be removed in the next major
17+
# version update.
18+
_config_deprecated = dict(
19+
every_n_train_steps="val_check_interval",
20+
max_iters="cosine_schedule_period_iters",
21+
)
22+
23+
1524
class Config:
1625
"""The Casanovo configuration options.
1726
@@ -56,7 +65,7 @@ class Config:
5665
tb_summarywriter=str,
5766
train_label_smoothing=float,
5867
warmup_iters=int,
59-
max_iters=int,
68+
cosine_schedule_period_iters=int,
6069
learning_rate=float,
6170
weight_decay=float,
6271
train_batch_size=int,
@@ -84,6 +93,15 @@ def __init__(self, config_file: Optional[str] = None):
8493
else:
8594
with Path(config_file).open() as f_in:
8695
self._user_config = yaml.safe_load(f_in)
96+
# Remap deprecated config entries.
97+
for old, new in _config_deprecated.items():
98+
if old in self._user_config:
99+
self._user_config[new] = self._user_config.pop(old)
100+
warnings.warn(
101+
f"Deprecated config option '{old}' remapped to "
102+
f"'{new}'",
103+
DeprecationWarning,
104+
)
87105
# Check for missing entries in config file.
88106
config_missing = self._params.keys() - self._user_config.keys()
89107
if len(config_missing) > 0:

casanovo/config.yaml

Lines changed: 44 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,103 +4,102 @@
44
###
55

66
###
7-
# The following parameters can be modified when running inference or
8-
# when fine-tuning an existing Casanovo model.
7+
# The following parameters can be modified when running inference or when
8+
# fine-tuning an existing Casanovo model.
99
###
1010

11-
# Max absolute difference allowed with respect to observed precursor m/z
11+
# Max absolute difference allowed with respect to observed precursor m/z.
1212
# Predictions outside the tolerance range are assigned a negative peptide score.
1313
precursor_mass_tol: 50 # ppm
14-
# Isotopes to consider when comparing predicted and observed precursor m/z's
14+
# Isotopes to consider when comparing predicted and observed precursor m/z's.
1515
isotope_error_range: [0, 1]
16-
# The minimum length of predicted peptides
16+
# The minimum length of predicted peptides.
1717
min_peptide_len: 6
18-
# Number of spectra in one inference batch
18+
# Number of spectra in one inference batch.
1919
predict_batch_size: 1024
20-
# Number of beams used in beam search
20+
# Number of beams used in beam search.
2121
n_beams: 1
22-
# Number of PSMs for each spectrum
22+
# Number of PSMs for each spectrum.
2323
top_match: 1
2424
# The hardware accelerator to use. Must be one of:
25-
# "cpu", "gpu", "tpu", "ipu", "hpu", "mps", or "auto"
25+
# "cpu", "gpu", "tpu", "ipu", "hpu", "mps", or "auto".
2626
accelerator: "auto"
27-
# The devices to use. Can be set to a positive number int,
28-
# or the value -1 to indicate all available devices should be used,
29-
# If left empty, the appropriate number will be automatically
30-
# selected for automatic selected on the chosen accelerator.
27+
# The devices to use. Can be set to a positive number int, or the value -1 to
28+
# indicate all available devices should be used. If left empty, the appropriate
29+
# number will be automatically selected for based on the chosen accelerator.
3130
devices:
3231

3332
###
3433
# The following parameters should only be modified if you are training a new
3534
# Casanovo model from scratch.
3635
###
3736

38-
# Random seed to ensure reproducible results
37+
# Random seed to ensure reproducible results.
3938
random_seed: 454
4039

4140
# OUTPUT OPTIONS
42-
# Logging frequency in training steps
41+
# Logging frequency in training steps.
4342
n_log: 1
44-
# Tensorboard directory to use for keeping track of training metrics
43+
# Tensorboard directory to use for keeping track of training metrics.
4544
tb_summarywriter:
46-
# Save the top k model checkpoints during training. -1 saves all, and
47-
# leaving this field empty saves none.
45+
# Save the top k model checkpoints during training. -1 saves all, and leaving
46+
# this field empty saves none.
4847
save_top_k: 5
49-
# Path to saved checkpoints
48+
# Path to saved checkpoints.
5049
model_save_folder_path: ""
51-
# Model validation and checkpointing frequency in training steps
50+
# Model validation and checkpointing frequency in training steps.
5251
val_check_interval: 50_000
5352

5453
# SPECTRUM PROCESSING OPTIONS
55-
# Number of the most intense peaks to retain, any remaining peaks are discarded
54+
# Number of the most intense peaks to retain, any remaining peaks are discarded.
5655
n_peaks: 150
57-
# Min peak m/z allowed, peaks with smaller m/z are discarded
56+
# Min peak m/z allowed, peaks with smaller m/z are discarded.
5857
min_mz: 50.0
59-
# Max peak m/z allowed, peaks with larger m/z are discarded
58+
# Max peak m/z allowed, peaks with larger m/z are discarded.
6059
max_mz: 2500.0
61-
# Min peak intensity allowed, less intense peaks are discarded
60+
# Min peak intensity allowed, less intense peaks are discarded.
6261
min_intensity: 0.01
63-
# Max absolute m/z difference allowed when removing the precursor peak
62+
# Max absolute m/z difference allowed when removing the precursor peak.
6463
remove_precursor_tol: 2.0 # Da
65-
# Max precursor charge allowed, spectra with larger charge are skipped
64+
# Max precursor charge allowed, spectra with larger charge are skipped.
6665
max_charge: 10
6766

6867
# MODEL ARCHITECTURE OPTIONS
69-
# Dimensionality of latent representations, i.e. peak embeddings
68+
# Dimensionality of latent representations, i.e. peak embeddings.
7069
dim_model: 512
71-
# Number of attention heads
70+
# Number of attention heads.
7271
n_head: 8
73-
# Dimensionality of fully connected layers
72+
# Dimensionality of fully connected layers.
7473
dim_feedforward: 1024
75-
# Number of transformer layers in spectrum encoder and peptide decoder
74+
# Number of transformer layers in spectrum encoder and peptide decoder.
7675
n_layers: 9
77-
# Dropout rate for model weights
76+
# Dropout rate for model weights.
7877
dropout: 0.0
79-
# Number of dimensions to use for encoding peak intensity
80-
# Projected up to ``dim_model`` by default and summed with the peak m/z encoding
78+
# Number of dimensions to use for encoding peak intensity.
79+
# Projected up to `dim_model` by default and summed with the peak m/z encoding.
8180
dim_intensity:
82-
# Max decoded peptide length
81+
# Max decoded peptide length.
8382
max_length: 100
84-
# Number of warmup iterations for learning rate scheduler
83+
# The number of iterations for the linear warm-up of the learning rate.
8584
warmup_iters: 100_000
86-
# Max number of iterations for learning rate scheduler
87-
max_iters: 600_000
88-
# Learning rate for weight updates during training
85+
# The number of iterations for the cosine half period of the learning rate.
86+
cosine_schedule_period_iters: 600_000
87+
# Learning rate for weight updates during training.
8988
learning_rate: 5e-4
90-
# Regularization term for weight updates
89+
# Regularization term for weight updates.
9190
weight_decay: 1e-5
92-
# Amount of label smoothing when computing the training loss
91+
# Amount of label smoothing when computing the training loss.
9392
train_label_smoothing: 0.01
9493

9594
# TRAINING/INFERENCE OPTIONS
96-
# Number of spectra in one training batch
95+
# Number of spectra in one training batch.
9796
train_batch_size: 32
98-
# Max number of training epochs
97+
# Max number of training epochs.
9998
max_epochs: 30
100-
# Number of validation steps to run before training begins
99+
# Number of validation steps to run before training begins.
101100
num_sanity_val_steps: 0
102-
# Calculate peptide and amino acid precision during training. this
103-
# is expensive, so we recommend against it.
101+
# Calculate peptide and amino acid precision during training.
102+
# This is expensive, so we recommend against it.
104103
calculate_precision: False
105104

106105
# AMINO ACID AND MODIFICATION VOCABULARY

casanovo/data/ms_io.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def save(self) -> None:
147147
"""
148148
Export the spectrum identifications to the mzTab file.
149149
"""
150-
with open(self.filename, "w") as f:
150+
with open(self.filename, "w", newline="") as f:
151151
writer = csv.writer(f, delimiter="\t", lineterminator=os.linesep)
152152
# Write metadata.
153153
for row in self.metadata:

0 commit comments

Comments
 (0)