Accounting for genome region specific coverage biases

We have observed in our data (generated through multiple different Illumina machines and library prep methods), that local coverage density varies across the genome, predictably so, across all genotypes. When we calculate read coverage by bin size in any two genotypes, we observe a correlation between the two read coverage in two genotypes in a specific bin. Ideally, if sampling across the genome is random, we should see no correlation. Also, in the real data, the correlation coefficient stays the same regardless of the bin size.

Reads produced through wg-sim also produce this correlation, albeit the correlation coefficient is smaller, and approaches the correlation coefficient of real data at bin sizes of >100kb. Is there a way to manipulate this correlation coefficient ourselves?

Ayush

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accounting for genome region specific coverage biases #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Accounting for genome region specific coverage biases #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions