Skip to content

Accounting for genome region specific coverage biases #17

@AyushSaxena

Description

@AyushSaxena

We have observed in our data (generated through multiple different Illumina machines and library prep methods), that local coverage density varies across the genome, predictably so, across all genotypes. When we calculate read coverage by bin size in any two genotypes, we observe a correlation between the two read coverage in two genotypes in a specific bin. Ideally, if sampling across the genome is random, we should see no correlation. Also, in the real data, the correlation coefficient stays the same regardless of the bin size.

Reads produced through wg-sim also produce this correlation, albeit the correlation coefficient is smaller, and approaches the correlation coefficient of real data at bin sizes of >100kb. Is there a way to manipulate this correlation coefficient ourselves?

Ayush

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions