Skip to content

Conversation

MatthiasZepper
Copy link
Member

Proposed change

This PR introduces a new argument, --position, to the external sub-command of umi-transfer. The --position argument allows users to specify where the UMI is inserted, with the following options:

  • header (default) – UMI is appended to the read header, as traditionally used.
  • inline – UMI is inserted directly before the read sequence, enabling compatibility with Sarek.

On a technical level, all read processing is now moved to a separate module. This will also be useful, should we extend the tool by umi-transfer inline, which would partly perform the reverse action (extract the UMI from the read instead of pasting the two together).

Motivation

Sarek supports the use of consensus reads to increase the accuracy of variant calls. Consensus reads are formed by identifying, grouping and collapsing duplicate reads that originate from the same DNA molecule. Sequencing errors are corrected in the process and therefore the number of artifactual variant calls reduced.

Sarek uses fgbio for consensus read formation and processing. While the tool supports using UMIs from external files, the pipeline's sample sheet doesn't allow for a third FastQ file as input. Hence, UMIs must be integrated to the read first.

Outlook

Possibly, this change justifies a v1.6 release?

@MatthiasZepper MatthiasZepper changed the base branch from main to dev February 4, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant