@@ -4,11 +4,11 @@ title: Duplicate Marking Methodology
4
4
5
5
## Differences between WGS and WGBS Duplicate Marking
6
6
7
- Duplicate reads primarily come from two sources: polymerase chain reaction (PCR) and optical duplicates. Optical
8
- duplicates arise from the sequencer splitting a single cluster into two or more clusters (see the description in the
9
- [ paper] ( https://doi.org/10.1093/bioinformatics/btad729 ) for where these clusters come from). While ` dupsifter ` is able
10
- to handle optical duplicates, they affect WGS and WGBS datasets in the same way. Therefore, this section will focus on
11
- PCR duplicates and the differences in duplicate marking reads from these two technologies.
7
+ Duplicate reads primarily come from two sources: polymerase chain reaction (PCR) amplification and optical duplicates.
8
+ Optical duplicates arise from the sequencer splitting a single cluster into two or more clusters (see the description in
9
+ the [ paper] ( https://doi.org/10.1093/bioinformatics/btad729 ) for where these clusters come from). While ` dupsifter ` is
10
+ able to handle optical duplicates, they affect WGS and WGBS datasets in the same way. Therefore, this section will focus
11
+ on PCR duplicates and the differences in duplicate marking reads from these two technologies.
12
12
13
13
PCR amplification is frequently used in WGS and WGBS to increase the amount of input DNA, which increases the chance of
14
14
a DNA fragment being sequenced, but incurs a cost of some fragments being sequencing more than once. PCR duplicates are
@@ -21,7 +21,7 @@ these strands (CTOT and CTOB, respectively). This additional step means there ar
21
21
DNA fragment in WGBS versus only one in WGS. Therefore, for WGBS experiments, we must distinguish between reads coming
22
22
from the OT and OB strands at the same location and true PCR duplicates.
23
23
24
- ` dupsifter ` handles these differences by also factoring in the bisulfite strand (OT/CTOT or OB/CTOB) when determining if
24
+ ` dupsifter ` handles these differences by factoring in the bisulfite strand (OT/CTOT or OB/CTOB) when determining if
25
25
a read is a duplicate. In the case where the user is running in WGS mode, ` dupsifter ` treates all reads as coming from
26
26
the same original strand.
27
27
0 commit comments