Releases: Noble-Lab/casanovo
Releases · Noble-Lab/casanovo
Casanovo v5.0.0
5.0.0 - 2025-07-09
Added
- Casanovo-DB mode (
casanovo db_search
) to use Casanovo as a learned score function for sequence database searching (given a FASTA protein database). - During training, model checkpoints will be saved at the end of each training epoch in addition to the checkpoints saved at the end of every validation run.
- Besides as a local file, model weights can be specified from a URL. Upon initial download, the weights file is cached for future re-use.
- Training and optimizer metrics can be logged to a CSV file by setting the
log_metrics
config file option to true. The CSV file will be written to under a sub-directory of the output directory namedcsv_logs
. - New configuration options for detailed control of the gradients during training (gradient accumulation, clipping).
- New configuration option
min_peaks
to discard low-quality spectra with too few peaks.
Changed
- Removed the
evaluate
sub-command, and all model evaluation functionality has been moved to thesequence
command using the new--evaluate
flag. - The
--output
option has been split into two options,--output_dir
and--output_root
. - The path suffix (extension) of
--output_root
will no longer be removed as it was with the old--output
option. - The
--validation_peak_path
is now optional when training; if--validation_peak_path
is not set then thetrain_peak_path
will also be used for validation. - The
tb_summarywriter
config option is now a boolean config option, and if set to true the TensorBoard summary will be written to a sub-directory of the output directory namedtensorboard
. - Input peak files can now be specified as both individual file(s) and a directory.
- Peptidoforms are specified using ProForma 2.0 notation by default.
- DepthCharge is upgraded to the latest version 0.4.8.
- The product of the raw amino acid scores is used as the peptide score, rather then the arithmetic mean.
- Amino acid scores are directly reported, rather than averaged with the peptide score.
- The amino acid-level score of stand-alone N-terminal modifications is combined with that of the leading N-terminal residue.
- Renamed the
n_peaks
configuration option of the maximum number of peaks to retain in a spectrum tomax_peaks
. - Beam search decoding has been optimized for computational efficiency, achieving increased prediction speed.
Fixed
- Precursor charges are exported as integers instead of floats in the mzTab output file, in compliance with the mzTab specification.
- Fixed log entries written to the config file instead of the log file when running the
configure
command.
Removed
- Removed the
save_top_k
option from the Casanovo config, the model with the lowest validation loss during training will now be saved to a fixed filename<output_root>.best.ckpt
. - The
model_save_folder_path
config option has been removed; model checkpoints will now be saved to--output_dir
during training.
Casanovo v4.3.0
4.3.0 - 2024-12-13
Fixed
- Amino acid scores in the mzTab output were reported in reversed order.
Casanovo v4.2.1
4.2.1 - 2024-06-25
Fixed
- Pin NumPy version to below v2.0 to ensure compatibility with current DepthCharge version.
Casanovo v4.2.0
4.2.0 - 2024-05-14
Added
- A deprecation warning will be issued when deprecated config options are used in the config file or in the model weights file.
Changed
- Config option
max_iters
has been renamed tocosine_schedule_period_iters
to better reflect that it controls the number of iterations for the cosine half period of the learning rate.
Fixed
- Fix beam search caching failure when multiple beams have an equal predicted peptide score by breaking ties randomly.
- The mzTab output file now has proper line endings regardless of platform, fixing the extra
\r
found when run on Windows.
Casanovo v4.1.0
4.1.0 - 2024-02-16
Changed
- Instead of having to specify
train_from_scratch
in the config file, training will proceed from an existing model weights file if this is given as an argument tocasanovo train
.
Fixed
- Fixed beam search decoding error due to non-deterministic selection of beams with equal scores.
Casanovo v4.0.1
4.0.1 - 2023-12-25
Fixed
- Fix automatic PyPI upload.
Casanovo v4.0.0
4.0.0 - 2023-12-22
Added
- Checkpoints include model parameters, allowing for mismatches with the provided configuration file.
accelerator
parameter controls the accelerator (CPU, GPU, etc) that is used.devices
parameter controls the number of accelerators used.val_check_interval
parameter controls the frequency of both validation epochs and model checkpointing during training.train_label_smoothing
parameter controls the amount of label smoothing applied when calculating the training loss.
Changed
- The CLI has been overhauled to use subcommands.
- Upgraded to Lightning >=2.0.
- Checkpointing is configured to save the top-k models instead of all.
- Log steps rather than epochs as units of progress during training.
- Validation performance metrics are logged (and added to tensorboard) at the validation epoch, and training loss is logged at the end of training epoch, i.e. training and validation metrics are logged asynchronously.
- Irrelevant warning messages on the console output and in the log file are no longer shown.
- Nicely format logged warnings.
every_n_train_steps
has been renamed toval_check_interval
in accordance to the corresponding Pytorch Lightning parameter.- Training batches are randomly shuffled.
- Upgraded to Torch >=2.1.
Removed
- Remove config option for a custom Pytorch Lightning logger.
- Remove superfluous
custom_encoder
config option.
Fixed
- Casanovo runs on CPU and can pass all tests.
- Correctly refer to input peak files by their full file path.
- Specifying custom residues to retrain Casanovo is now possible.
- Upgrade to depthcharge v0.2.3 to fix sinusoidal encoding and for the
PeptideTransformerDecoder
hotfix. - Correctly report amino acid precision and recall during validation.
Casanovo v3.5.0
3.5.0 - 2023-08-16
Fixed
- Don't try to assign non-existing output writer during eval mode.
- Specifying custom residues to retrain Casanovo is now possible.
Casanovo v3.4.0
3.4.0 - 2023-06-19
Added
every_n_train_steps
parameter now controls the frequency of both validation epochs and model checkpointing during training.
Changed
- We now log steps rather than epochs as units of progress during training.
- Validation performance metrics are logged (and added to tensorboard) at the validation epoch, and training loss is logged at the end of training epoch, i.e. training and validation metrics are logged asynchronously.
Fixed
- Correctly refer to input peak files by their full file path.
Casanovo v3.3.0
3.3.0 - 2023-04-04
Added
- Included the
min_peptide_len
parameter in the configuration file to restrict predictions to peptide with a minimum length. - Export multiple PSMs per spectrum using the
top_match
parameter in the configuration file.
Changed
- Calculate the amino acid scores as the average of the amino acid scores and the peptide score.
- Spectra from mzML and mzXML peak files are referred to by their scan numbers in the mzTab output instead of their indexes.
Fixed
- Verify that the final predicted amino acid is the stop token.
- Spectra are correctly matched to their input peak file when analyzing multiple files simultaneously.
- The score of the stop token is taken into account when calculating the predicted peptide score.
- Peptides with incorrect N-terminal modifications (multiple or internal positions) are no longer predicted.