Skip to content

Evaluating SV callers

Luca Santuari edited this page Apr 6, 2018 · 4 revisions

Datasets

Ground truth

To quantify the ability of each caller to detect SVs, we use multiple ground truth datasets. These are collections of SVs that have been validated with multiple sequencing technologies and through comparison with indipendent datasets. Here is a tentative list:

  1. The pilot genome NA12878 from Genome in a Bottle (GiaB) Consortium
  2. The COLO829 dataset of the Hartwig Medical Foundation
  3. The OC26 dataset (Kloosterman group)

Genome in a Bottle: NA12878/NA24385 tumor-like mixture

Description

SV VCF and BED files

Internal dataset for Tumor and Normal BAM files (17% clipped reads).

VCF comparison

mergeVCF

mergeVCF can be used to merge VCF files. The output is a VCF file interleaved with informations on records.

SURVIVOR merge

SURVIVOR has a merge tool that can be used for merging VCF files

Clone this wiki locally