Skip to content

Conversation

Lilferrit
Copy link
Contributor

In order to address #108, the --model parameter now supports URLs. If the --model parameter is a valid URL the Casanovo CLI will first look in the cache for the weight file (cache resolution is done via hashing the URL), and if the weight file isn't cached the CLI will attempt to download the weight file and cache it. The CLI will raise an error if the --model parameter is set but it's not a valid URL or file in the local filesystem.

However this implementation does have an issue where if the weight file at a URL is updated, but the CLI has previously downloaded weights from that URL, the new weights won't be downloaded because there will still be a cache hit. To fix this I could introduce a --force option that skips the cache and always attempts to download and cache weights given a URL.

@Lilferrit Lilferrit linked an issue Jun 28, 2024 that may be closed by this pull request
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 91.80328% with 5 lines in your changes missing coverage. Please review.

Project coverage is 94.03%. Comparing base (276a50e) to head (c2bc971).

Files Patch % Lines
casanovo/casanovo.py 91.80% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #349      +/-   ##
==========================================
+ Coverage   89.98%   94.03%   +4.04%     
==========================================
  Files          12       12              
  Lines         979     1022      +43     
==========================================
+ Hits          881      961      +80     
+ Misses         98       61      -37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bittremieux bittremieux changed the base branch from main to dev June 28, 2024 08:13
@Lilferrit Lilferrit marked this pull request as ready for review July 2, 2024 21:06
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments.

Two key things are still missing as well:

  1. Documentation of this functionality on readthedocs (maybe on the file formats page).
  2. Unit tests for the download behavior.

@bittremieux
Copy link
Collaborator

However this implementation does have an issue where if the weight file at a URL is updated, but the CLI has previously downloaded weights from that URL, the new weights won't be downloaded because there will still be a cache hit.

You could also check the "last-modified" information for the URL and verify whether this is more recent than the download time or not.

@Lilferrit
Copy link
Contributor Author

I adjusted the hashing implementation such that now files downloaded from a user specified URL will now be downloaded to <cache_dir>/<url_hash>/<file_name>.ckpt (where filename is the actual filename in the url). Files downloaded automatically by _get_model_weights are still downloaded directly to the cache directory, the hash sub-dirs are only created when a custom url is specified.

@Lilferrit
Copy link
Contributor Author

A few small comments.

Two key things are still missing as well:

  1. Documentation of this functionality on readthedocs (maybe on the file formats page).
  2. Unit tests for the download behavior.

In regards to the readthedocs documentation it looks like weight files aren't mentioned at all in the "Input Files" section on the file formats page. Is this something we want to add?

@bittremieux
Copy link
Collaborator

Yes, this new (and also old) behavior of weights file downloading should still be added to the documentation.

Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's starting to look quite nice for a complex piece of code. A few more comments.

file_response = requests.head(file_url)
if file_response.ok:
if "Last-Modified" in file_response.headers:
url_last_modified = email.utils.parsedate_to_datetime(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I'm reluctant to include a library with rather different functionality for just a utility function (even though it's part of the standard libraries). Do we need extra functionality that strptime doesn't offer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I changed it to use strptime instead.

import logging
import urllib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I don't think this is necessary (rather the relevant import is urllib.parse a few lines further down).

if _is_valid_url(model):
model = _get_weights_from_url(model, cache_dir)
elif not Path(model).is_file():
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Add a similar logging statement.

@@ -11,6 +13,14 @@ When you're ready to use Casanovo for *de novo* peptide sequencing, you can inpu
All three of the above file formats can be used as input to Casanovo for *de novo* peptide sequencing.
As the official PSI standard format containing the complete information from a mass spectrometry run, mzML should typically be preferred.

### Model Weights

In addition to MS/MS spectra, Casanova also optionally accepts a model weights (.ckpt) input file when running in training, sequencing or evaluating mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "Casanovo"

@Lilferrit Lilferrit requested a review from bittremieux July 30, 2024 16:04
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice implementation and great work on the unit tests. 👏

@Lilferrit Lilferrit merged commit 9e3b630 into dev Aug 2, 2024
@Lilferrit Lilferrit deleted the download-weights branch August 2, 2024 16:33
bittremieux added a commit that referenced this pull request Jul 9, 2025
* Remove `train_from_scratch` config option (#275)

Instead of having to specify `train_from_scratch` in the config file, training will proceed from an existing model weights file if this is given as an argument to `casanovo train`.

Fixes #263.

* Stabilize torch.topk() behavior (#290)

* Add epsilon to index zero

* Fix typo

* Use base PyTorch for repeating along the vocabulary size

* Combine masking steps

* Lint with updated black version

* Lint test files

* Add topk unit test

* Fix lint

* Add fixme comment for future

* Update changelog

* Generate new screengrabs with rich-codex

---------

Co-authored-by: Wout Bittremieux <wout@bittremieux.be>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Rename max_iters to cosine_schedule_period_iters (#300)

* Rename max_iters to cosine_schedule_period_iters

* Add deprecated config option unit test

* Fix missed rename

* Proper linting

* Remove unnecessary logging

* Test that checkpoints with deprecated config options can be loaded

* Minor change

* Add test for fine-tuning with deprecated config options

* Remove deprecated hyperparameters during model loading

* Include deprecated hyperparameter warning

* Test whether the warning is issued

* Verify that the deprecated option is removed

* Fix comments

* Avoid defining deprecated options twice

* Remap previous renamed config option `every_n_train_steps`

* Update changelog

---------

Co-authored-by: melihyilmaz <yilmazmelih97@gmail.com>

* Add FAQ entry about antibody sequencing

* Don't crash when multiple beams have identical peptide scores (#306)

* Test different beams with identical scores

* Randomly break ties for beams with identical peptide score

* Update changelog

* Don't remove unit test

* Allow csv to handle all newlines (#316)

* Add 9-species model weights link to FAQ (#303)

* Add model weights link

* Generate new screengrabs with rich-codex

* Clarify that these weights should only be used for benchmarking

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* Add FAQ entry about antibody sequencing (#304)

* Add FAQ entry about antibody sequencing

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com>

* Allow csv to handle all newlines

The `csv` module tries to handle newlines itself. On Windows, this leads to line endings of `\r\r\n` instead of `\r\n`.

Setting `newline=''` produces the intended output on both platforms.

* Update CHANGELOG.md

* Fix linting issue

* Delete docs/images/help.svg

---------

Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>
Co-authored-by: William Stafford Noble <wnoble@uw.edu>
Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com>

* Don't test on macOS versions with MPS (#327)

* Prepare for release v4.2.0

* Update CHANGELOG.md (#332)

* implemented automatic CLI documentation generation

* converted CLI doc page to markdown, added CLI doc page intro

* fixed cli man page formatting bug

* implemented report_gen submodule

* report_gen documentation

* report_gen submodule test

* naming conventions

* naming conventions

* implemented save last checkpoint

* implemented last checkpoint saving using trainer save_checkpoint instead of callback

* final checkpoint file name

* added final epoch number to final checkpoint name

* linter rules

* changed casanovo cli help message to rst for compatability with sphinx-python

* implemented filter javascript

* resolved linter errors

* resolved more linter errors

* Generate new screengrabs with rich-codex

* PredictionWriter virtual class

* multi prediction writer

* LogPredicitonWriter wip

* implemented logger io

* removed report gen submodule

* logger io test

* logging info

* implemented end of run logging

* cli docuementation fixes/improvements

* Generate new screengrabs with rich-codex

* logger io test fix

* formatting fixes

* test file formatting

* fixed misspelling

* Restrict NumPy to pre-2.0

* Update changelog

* PredictionMultiWriter s\erialization

* log writer error handling

* reformatting

* verified skipped spectra counter

* Generate new screengrabs with rich-codex

* Export precursor charge as int in mzTab (#341)

* implemented integer charce to comply with mztab standard

* updated changelog

* updated changelog

* save final model using ModelCheckpoint callback

* added sphinx-click dependency to pyproject.toml

* implemented save model unit test

* fixed test_runner save checkpoints to working directory bug

* save final model test refactor

* save final model changelog entry

* migrated end of run report logging functionality to ms_io

* cli restructured text file formatting

* remove changes from model.py

* Generate new screengrabs with rich-codex

* moved logging utility functions to util.py

* requested changes

* CLI man page event handler implementation

* requested changes

* more requested changes

* requested changes

* Minor simplifications

* Fix tests

* End of run report logging error fix (#357)

* geq bug fix

* Better fix

---------

Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* Download weight file from URL (#349)

* downlaod weights from URL

* Generate new screengrabs with rich-codex

* reduced size of cached URL weight file names

* implemented hash cache resolution

* preliminary get file from url test

* hash resolition via dirname

* Generate new screengrabs with rich-codex

* Download weights documentation

* bad URL testcase - use Datetime library for processing request headers

* unit tests for setup_model, Github api mocking

* github weights download resolution documentation

* test_get_model_weights mac os fix

* Update CHANGELOG

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gwenneth Straub <gwennethstraub@noble103408.local>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* MzTab Validation (#362)

* implemented integer charce to comply with mztab standard

* updated changelog

* updated changelog

* Update paper reference (#361)

* implemented jmztab validation steps

* removed unecessary status code checks

* tiny model checkpoints

* upgrade codecove to v4

* fixed java command

* codecov fix

* seperated JMZtab validator command

* syntax error fix

* grep exit status handling

* status code logic

* export grep status code

* syntax error

* moved mzTab validation functionality to sh and bat scripts

* Generate new screengrabs with rich-codex

* git file permissions

* Generate new screengrabs with rich-codex

* platform specific validation

* Generate new screengrabs with rich-codex

* error condition test

* fail output fix

* remove test condition

* eliminate redundant output

* Generate new screengrabs with rich-codex

* success output

* Generate new screengrabs with rich-codex

* remove if exists

* Implemented jmzTab mzTab validation unit test

* Generate new screengrabs with rich-codex

* minor style change

---------

Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Save best model (#365)

* save best model

* save best model

* updated unit tests

* remove save top k config item

* added save_top_k to deprecated config options

* changelog entry

* test case, formatting

* requested changes

* Eliminate evaluate Command (#359)

* prediction output in model eval mode

* eliminate eval command, introduce -e flag for predict command

* adapted unit test to new model runner and model functionality

* updated documentation

* removed log and result files

* Generate new screengrabs with rich-codex

* Update paper reference (#361)

* Bug report template (#360)

* bug report template

* punctuation, hardware description item

* Restrict NumPy to pre-2.0 (#344)

* Restrict NumPy to pre-2.0

* Update changelog

* Update paper reference (#361)

---------

Co-authored-by: Lilferrit <straub.gavin@gmail.com>

* upgrade codecove to v4 (#364)

* implemen eval mode at model runner level, fix unit test

* CLI documentation

* Generate new screengrabs with rich-codex

* requested changes

* Generate new screengrabs with rich-codex

* evaluation test cases

* file warnings, evaluation tests

* fixed ubuntu specific test case bug

* verify annotated mgf files

* verify annotated mgf files

* Generate new screengrabs with rich-codex

* Save best model (#365)

* save best model

* save best model

* updated unit tests

* remove save top k config item

* added save_top_k to deprecated config options

* changelog entry

* test case, formatting

* requested changes

* prediction output in model eval mode

* eliminate eval command, introduce -e flag for predict command

* adapted unit test to new model runner and model functionality

* updated documentation

* removed log and result files

* implemen eval mode at model runner level, fix unit test

* CLI documentation

* Bug report template (#360)

* bug report template

* punctuation, hardware description item

* Restrict NumPy to pre-2.0 (#344)

* Restrict NumPy to pre-2.0

* Update changelog

* Update paper reference (#361)

---------

Co-authored-by: Lilferrit <straub.gavin@gmail.com>

* requested changes

* evaluation test cases

* file warnings, evaluation tests

* fixed ubuntu specific test case bug

* verify annotated mgf files

* AnnotatedSpectrumIndex type error

* requested changes, changelog entry

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com>

* PSM Data Class (#368)

* psm data class

* PepSpecMatch field naming and documentation

* File IO command line options revision (#372)

* file io console options

* output console io options

* file io options tests

* changelog entry

* revised changelog

* file io console options

* output console io options

* file io options tests

* changelog entry

* revised changelog

* Generate new screengrabs with rich-codex

* requested changes

* updated integration test

* requested changes

* Generate new screengrabs with rich-codex

* requested changes, output setup refactor

* ModelRunner documentation

* requested changes, _setup_output unit test

* ModelRunner output root bug fix, setup_model documentation, sequence output setup bug fix

* Generate new screengrabs with rich-codex

* logging format character

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Tensorboard io (#374)

* file io console options

* output console io options

* file io options tests

* changelog entry

* revised changelog

* file io console options

* output console io options

* file io options tests

* changelog entry

* revised changelog

* Generate new screengrabs with rich-codex

* requested changes

* updated integration test

* requested changes

* Generate new screengrabs with rich-codex

* requested changes, output setup refactor

* ModelRunner documentation

* requested changes, _setup_output unit test

* write tensorboard to output directory

* ModelRunner output root bug fix, setup_model documentation, sequence output setup bug fix

* Generate new screengrabs with rich-codex

* logging format character

* write tensorboard to output directory

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Log optimizer and training metrics to CSV file (#376)

* csv logger

* optimizer metrics logger

* metrics logging unit tests

* config item retrieval, additional requested changes

* Generate new screengrabs with rich-codex

* changelog update

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Eval metrics and circular import bug fix. (#380)

* eval metrics bug fix

* better eval metrics bug fix

* eval metrics bug fix

* better eval metrics bug fix

* eval stats unit test, circular import fix

* log metrics unit test

* removed unused import

* log metrics refactor, additional log metrics test case

* aa_match_batch hanles none, additional skipped spectra test cases

* Log optimizer and training metrics to CSV file (#376)

* csv logger

* optimizer metrics logger

* metrics logging unit tests

* config item retrieval, additional requested changes

* Generate new screengrabs with rich-codex

* changelog update

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* aa_match_batch and aa_match handle None

* top_match eval metrics warning

* removed unused import

* log metrics refactor, additional log metrics test case

* aa_match_batch hanles none, additional skipped spectra test cases

* aa_match_batch and aa_match handle None

* top_match eval metrics warning

* eval metrics bug fix

* better eval metrics bug fix

* eval stats unit test, circular import fix

* log metrics unit test

* removed unused import

* log metrics refactor, additional log metrics test case

* aa_match_batch hanles none, additional skipped spectra test cases

* aa_match_batch and aa_match handle None

* top_match eval metrics warning

* removed unused import

* log metrics refactor, additional log metrics test case

* metrics file logging bug fix

* aa_match test cases, minor aa_match refactor

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update pytorch lightning requirement

* Switched to Geometric Mean for Peptide Level Scores (#392)

* switched to geometric mean for peptide level score

* non zero aa score test case, changelog item

* Generate new screengrabs with rich-codex

* Casanovo Nextflow Workflow Documentation (#391)

* nextflow documentation

* nextflow nav prefix, more nextflow docs details

* grammatical fixes

* grammatical fixes

* fixed links

* Read the Docs Update to Reflect New Command Line Functionality (#390)

* nextflow documentation

* nextflow nav prefix, more nextflow docs details

* grammatical fixes

* grammatical fixes

* fixed links

* nextflow documentation

* nextflow nav prefix, more nextflow docs details

* grammatical fixes

* grammatical fixes

* fixed links

* update Read the Docs with new functionality

* rephrasing

* update file formats section

* updated faq note; training faq section changes

* Add Casanovo-DB Functionality (#325)

* begin adding tests for annotate mode

* add basic test for annotate mode

* added test case for annotate mode and modified method

* very rough sketch of db upgrade (untested)

* small upgrades to documentation

* better output formatting

* all tests added

* remove minor debugging print statement

* Generate new screengrabs with rich-codex

* remove excess info logs, add monkeypatch to tests

* mp fix

* fix line lengths and modify test

* Generate new screengrabs with rich-codex

* justins requested fixes

* added minor changes as requested by Wout

* partial fixes requested by wout. Lots of subclassing removed

* documentation fixes and starting to cleanup batching code

* cleaned up on_predict_batch_end, TODOs for calc_mz

* add proper calc_mz calculation with depthcharge

* rough implementation

* tested implementation of db search

* fix for issue with 0 candidates

* minor fixes added

* reordered and renamed variables for consistency

* casanovo-db full working version with code simplification

* Generate new screengrabs with rich-codex

* fix batching issues

* small fixes regarding documentation, import syntax, etc.

* add proteindatabase

* Generate new screengrabs with rich-codex

* finish proteindatabase

* all comments addressed

* new comments addressed

* final adjustments added

* minor changes regarding formatting and small efficiency boosts

* changes before reformatting config

* replace all occurences of "max_length" with "max_peptide_len"

* added nonspecific digestion

* minor comments

* full branch comments addressed

* Generate new screengrabs with rich-codex

* updated and fixed failed tests

* add mztab validation to dbsearch test

* lint fix

* fix integration test

* fix unit tests

* force fix test

* clean up test_digest_fasta_enzyme

* adjust test_digest_fasta_mods

* allows top_match filtering for casanovo-db

* change default value for protein value in PepSpecMatch

* reverse issues with decoder

* update test and remove logging statement

* db_utils fixes

* updates to dataloaders, model_runner, and model.py

* near final changes for all but db_utils

* line length fixes

* Minor refactoring and type hint fixes

* Use mask for more efficient candidate filtering

* Reorder methods in logical order

* Fix unit tests

* Directly generate DB peptides as DataFrame

* Fix type hints and line lengths

* Generate new screengrabs with rich-codex

* Refactor batching to avoid code repetition

* More minor refactoring

* Reformat with black

* Minor fix

* Fix output name crash

* Fix AA score masking

* Fix PSM export

* Less verbose logging of skipped peptides

* Appropriate end-of-run reporting

* Fix PSM export from de novo

* Generalize end-of-run reporting

* Log additional information on spectra with no matching candidates

* Fix linting issue

* Fix some testing warnings

* Log digestion settings

* Reduce logging level for spectra without candidates

* Round peptide masses for consistent sorting

* Fox linting

* Remove superfluous PSM export

* Update changelog

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* Fix Log Entry to Config File Bug (#406)

* log entry to config file bug fix

* Generate new screengrabs with rich-codex

* file suffix fix

* comprehensive configure integration test, shared file io commands\

* Generate new screengrabs with rich-codex

* Minor simplification

* Update changelog

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* Upgrade dependencies with security vulnerabilities (#421)

* Update docs for Casanovo-DB (#404)

* Update getting_started.md

image not added yet

* Update cli.rst

added db-search to :commands:

* Update getting_started.md

add svg

* Update file_formats.md

add information on file formats, update config print, explain accession field

* add sample data and update getting_started

* Generate new screengrabs with rich-codex

* Fix incorrect image in getting_started.md

Fix screengrabs

* Generate new screengrabs with rich-codex

* fixed getting_started and pruned mouse fasta

* Minor edits

* Fixes and updates

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <wout@bittremieux.be>

* Windows test pipeline failure on `dev` branch fix (#432)

* runner unit tests fix

* Output root suffix fix (#434)

* output root suffix fix

* clean up; changelog

* _setup_output cleanup

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Reversed peptide aa scores hotfix (#417) (#435)

* reverse aa scores hotfix

* reverse aa scores hotfix

* Migration to depthcharge v0.4.8 (#442)

* Migration to depthcharge v0.4.8 (#350)

* migration to depthcharge v0.4.8

* shuffling training set by default

* Reformat with Black

* Fix formatting again after merge

* Resolve requested changes

* Reformat with Black

* removed invalid imports

* removed to be added functionality (for now)

* tensorboard logger

* circular import bug

* removed tensorboard unit tests

* beam search decode unit tests (IP)

* teast_beam_search decode test update

* test_eval_metrics test update

* unit tests updates

* spectrum id unit tests

* integration test fix

* model prediction io flow fixes

* PyLightning logging refactor

* mgf file reader title field formatting

* integration tests fix

* integration tests

* test_initialize_model fix

* test_save_and_load_weights fix

* test_save_and_load_weights_deprecated fix

* test_evaluate fix, evaluate unnanotated peak file error handling

* test_evaluate fix, evaluate unnanotated peak file error handling

* test_eval_metrics fix

* test_spectrum_id tests fix

* unit tests fixes

* teast_beam_search_decode fix

* negative residue work around

* depthcharge upgrade - all unit tests pass

* pylance depthcharge compatability fix

* removed scans field from dataloaders

* non db functionality working

* import orders, CasanovoDB psm batching

* CasanovoDB unit tests

* no batch made edge case

* mass caclulation

* CasanovoDB mass mod fixes

* remove unsqueeze batch method

* reduced test epochs from 20 to 15

* integration test fix

* integration test fix

* psm batch generator unit test

* cleanup debug code

* disable multi threading on linux

* skip n_threads unit test

* fixed double batching bug

* use tokens to compare peptides

---------

Co-authored-by: Daniela Klaproth-Andrade <salazar@in.tum.de>
Co-authored-by: William Fondrie <fondriew@gmail.com>
Co-authored-by: Lilferrit <straub.gavin@gmail.com>
Co-authored-by: Gwen Straub <gwen@Gwens-MacBook-Air.local>
Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com>

* Consistently order config values

* Minor refactoring data classes

* Refactor data loaders and model runners

* Update model

* Small edits in tests

* Proper linting

* Fix some unit tests

* Fix detokenizing

* Fix linting

* Minor refactoring de novo model

* Refactor and simplify database search model

* Fix linting

* Fix PSM batching for database search

* Fix reverse decoding unit test

* Fix peptide detokenizing

* Fix metrics logging

* Fix tensor shape mismatch

* Fix training-validation unit test

* Fix unit test for reverse decoding

* Update changelog

* Reduce database test precursor mass tolerance

* Fix linting

---------

Co-authored-by: andradesalazar <52212483+andradesalazar@users.noreply.github.com>
Co-authored-by: Daniela Klaproth-Andrade <salazar@in.tum.de>
Co-authored-by: William Fondrie <fondriew@gmail.com>
Co-authored-by: Lilferrit <straub.gavin@gmail.com>
Co-authored-by: Gwen Straub <gwen@Gwens-MacBook-Air.local>

* Log system information at start (#456)

* Log system information at start

* Fix peptide length logging

* Switched peptide level score to product of AA scores and removed AA score adjustement step (#458)

* calculate peptide score as product of AA scores

* _aa_pep_score function signature, changelog

* fixed documentation

* Log system information at start (#456)

* Log system information at start

* Fix peptide length logging

* Generate new screengrabs with rich-codex

* changelog entry

---------

Co-authored-by: Wout Bittremieux <bittremieux@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Combine N-terminal AA score with leading amino acid score (#459)

* sub residue config

* added test

* switched to product

* changelog; small change to combined score calculation

* fixed changelog

* updated changelog

* Discard low-quality spectra with too few peaks (#455)

* Rename `n_peaks` config option to `max_peaks`

* Discard spectra with too few peaks

Also filter low-intensity peaks _after_ intensity scaling, otherwise it's an arbitrary value coming from the input peak file.

* Test spectrum preprocessing

* Update changelog

* Update n_peaks renaming to max_peaks in changelog

* Fix linting

* Fix config for unit tests

* Fix tests checking for spectra with too few peaks

* Fix config for min_peaks integration test

* Depthcharge reverse detokinization bug work-around (#462)

* reverse detokenization bug work around

* added fixmes

* Flexible format for scan titles (#369)

* Use different scan titles in test mzML

* Update tests for matching scan id

These currently still fail, as expected.

* Casanovo DB `replace_isoleucine_with_leucine` bug fix (#472)

* replace_isoleucine_with_leucine bug fix

* fixed test case

* n-term mods config option fix (#475)

* Db Device Fix (#477)

* db-device-fix

* tokens device fix

* Casanovo DB Peptide Sequence Fix (#478)

* keep original sequence string

* added test

* Casanovo DB AA Score Fix (#480)

* aa score fix

* calc match score unit test

* calc mass type martialing

* reverse aa scores fix

* calc match score, rename variables

* Numerically Stable Peptide Score (#483)

* stable product score

* numerically stable peptide scores

* renaming

* eliminate magic num

* Casanovo DB Add Stop Token (#481)

* add stop token

* stop token test

* Improved beam search efficiency

* Improved beam search efficiency based on the latest model.py version

* Speed up inference time

* Fixed version

* fixed version

* test

* test2

* test3

* new model

* Update model.py and unit test

* Update CHANGELOG.md

* Omit stop token from reported DB search AA scores (#488)

* Casanovo DB Merge N-terminal Scores with Leading AA (#489)

* db merge n term score

* use base class on predict batch end

* Prepare for new release

* Tokenizer bugfix

* Fix linting

* Fix detokenizing bug introduced by merge

---------

Co-authored-by: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: melihyilmaz <yilmazmelih97@gmail.com>
Co-authored-by: wsnoble <wnoble@uw.edu>
Co-authored-by: Joshua Klein <mobiusklein@gmail.com>
Co-authored-by: Lilferrit <straub.gavin@gmail.com>
Co-authored-by: Lilferrit <gwenneth.straub@gmail.com>
Co-authored-by: Gwenneth Straub <gwennethstraub@noble103408.local>
Co-authored-by: justin-a-sanders <60298590+justin-a-sanders@users.noreply.github.com>
Co-authored-by: Varun Ananth <varunananth1@gmail.com>
Co-authored-by: andradesalazar <52212483+andradesalazar@users.noreply.github.com>
Co-authored-by: Daniela Klaproth-Andrade <salazar@in.tum.de>
Co-authored-by: William Fondrie <fondriew@gmail.com>
Co-authored-by: Gwen Straub <gwen@Gwens-MacBook-Air.local>
Co-authored-by: Nameless <59132154+NameLessEG@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow specifying different model weights from GitHub Assets
2 participants