Skip to content

Automated Nextstrain Clade Monitoring & Variant Definition Creation #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

gordonkoehn
Copy link
Collaborator

@gordonkoehn gordonkoehn commented Jul 14, 2025

This PR implements a complete automation pipeline for monitoring new Nextstrain clade designations and automatically creating variant definition files for wastewater surveillance, based in mutations found in clinical sequences found on CovSpectrum once the new variant appears.

See full description for design of this automation here on MIRO.

This setup mirrors and automates our existing procedure i.e.:

  1. Get notice of new Variant --> get Github Notification for new Issue
  2. Wait for new Variant definitions to appear once Nextclade / CovSpectrum finishes, ~16 h later, after - new variants are assigned in ncov.
  3. Run cojac sig-generate manually open PR --> PR appears automatically fetched from CovSpectrum, assigned to us

For both the Issues and the PR, we can select to notify/assign people, i.e., to @gordonkoehn, Kyra? David?

For example, Alert see here:

It's rather hard to test the scheduling of the GitHub actions, so it may very well be that we have to iterate on this mechanism. Trusting in AI so far. All manual triggering I do seems to work fine.

Detailed flow:

  • Triggers automatically when new clades are detected in ncov , makes Issue
  • Schedules to query the cov Spectrum every day for 7 days
  • Queries CovSpectrum API for nucleotide signature mutations
  • Generates properly formatted YAML files in voc directory
  • Creates pull requests with variant definitions
  • Links PRs to originating issues via "Closes #X"
  • Comments on issues with status updates and error details
  • Prevents duplicate PRs for the same variant

Workflow Architecture

  • Monitor (voc-monitor.yaml) - Detects new clades → Creates enriched issues
  • Creator (variant-creator.yaml) - Triggered by labeled issues → Creates variant definitions
  • Retry (variant-retry.yaml) - Scheduled retries for failed attempts

Testing

  • Tested the logic locally - validates all logic components
  • CovSpectrum API integration confirmed working
  • Generated files match expected YAML structure
  • Issue parsing and PR creation logic verified

This automation eliminates manual variant definition creation and ensures timely updates to the wastewater monitoring system.

=======

Before merging, we still need to set "arm" the workflow, that is:

  • Change the branch that merges updates to

@gordonkoehn gordonkoehn force-pushed the feat/nextclade_monitor branch from 100d638 to 2d988c0 Compare July 15, 2025 07:50
@gordonkoehn gordonkoehn changed the title Automatic Nextclade Clade Assigment Alerts Automated Nextstrain Clade Monitoring & Variant Definition Creation Jul 15, 2025
Gordon J. Köhn and others added 8 commits July 15, 2025 10:15
- Triggers on push to feat/nextclade_monitor branch
- Uses issue #10 as default for testing
- TEMPORARY: Remove push trigger after testing
- Checkout master branch instead of feature branch
- Add _bot suffix to distinguish automated files (e.g., xec_mutations_full_bot.yaml)
- Update PR titles and commit messages to indicate automation
- Ensure proper base branch for PR creation
- Only process issues that are still open
- Exit gracefully if issue is closed or not found
- Prevents processing of already resolved issues
- Retry workflow already filters for open issues
@gordonkoehn gordonkoehn self-assigned this Jul 16, 2025
@gordonkoehn gordonkoehn marked this pull request as draft July 18, 2025 09:27
Gordon J. Köhn and others added 20 commits July 22, 2025 15:39
- Add CovSpectrum links in issues for immediate variant checking
- Include comprehensive CovSpectrum URLs in PRs (Swiss, global data)
- Add 'keep open' instruction in issues for automation
- Update retry schedule to every 4 hours (was daily)
- Reduce max retry age to 2 days (was 7 days)
- Add @gordonkoehn notifications in success/failure comments
- Enhance variant-monitoring.md with complete technical design
- Restore daily retry at 18:00 UTC (was incorrectly changed to 4-hour)
- Extend retry period back to 7 days (CovSpectrum can take up to 7 days)
- Move CovSpectrum links to Resources section in issues
- Update all retry messages to reflect correct timing
- Confirm URLs use correct nextcladePangoLineage parameter
- Merged remote tracking file with new clades 25B and 25C
- Keep both local workflow improvements and remote clade updates
@kirschen-k kirschen-k removed their request for review July 22, 2025 15:08
@gordonkoehn
Copy link
Collaborator Author

Screenshot 2025-07-22 at 17 17 20 Screenshot 2025-07-22 at 17 17 42

Products ;)

@gordonkoehn gordonkoehn marked this pull request as ready for review July 22, 2025 15:18
@gordonkoehn gordonkoehn requested a review from kirschen-k July 22, 2025 15:18
@gordonkoehn
Copy link
Collaborator Author

gordonkoehn commented Jul 23, 2025

Before merging:

@gordonkoehn
Copy link
Collaborator Author

Will merge with single review. Note: This GitHub automation will push to main. To update its ledger of known variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants