You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: terminology_and_concepts.md
+21-11Lines changed: 21 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -416,28 +416,38 @@ there are multiple, overlaid ancestral recombination events.
416
416
417
417
### Tree sequences and ARGs
418
418
419
-
Much of the literature on ancestral inference concentrates on the Ancestral Recombination
420
-
Graph, or ARG, in which details of the position and potentially the timing of
421
-
recombination events are explictly stored. Although a tree sequence *can* represent such
422
-
an ARG, by incorporating nodes that represent recombination events (see the
423
-
{ref}`sec_args` tutorial), this is not normally done for two reasons:
419
+
::::{margin}
420
+
:::{note}
421
+
There is a subtle distinction between common ancestry and coalescence. In particular, all coalescent nodes are common ancestor events, but not all common ancestor events in an ARG result in coalescence in a local tree.
422
+
:::
423
+
::::
424
+
425
+
The term "Ancestral Recombination Graph", or ARG, is commonly used to describe a genetic
426
+
genealogy. In particular, many (but not all) authors use it to mean a genetic
427
+
genealogy in which details of the position and potentially the timing of all
428
+
recombination and common ancestor events are explictly stored. For clarity
429
+
we refer to this sort of genetic genealogy as a "full ARG". Succinct tree sequences can
430
+
represent many different sorts of ARGs, including "full ARGs", by incorporating extra
431
+
non-coalescent nodes (see the {ref}`sec_args` tutorial). However, tree sequences are
432
+
often shown and stored in {ref}`fully simplified<sec_simplification>` form,
433
+
which omits these extra nodes. This is for two main reasons:
424
434
425
435
1. Many recombination events are undetectable from sequence data, and even if they are
426
436
detectable, they can be logically impossible to place in the genealogy (as in the
427
437
second SPR example above).
428
-
2. The number of recombination events in the genealogy can grow to dominate the total
429
-
number of nodes in the total tree sequence, without actually contributing to the
430
-
realised sequences in the samples. In other words, recombination nodes are redundant
431
-
to the storing of genome data.
438
+
2. The number of recombination and non-coalescing common ancestor events in the genealogy
439
+
quickly grows to dominate the total number of nodes in the tree sequence,
440
+
without actually contributing to the mutations inherited by the samples.
441
+
In other words, these nodes are redundant to the storing of genome data.
432
442
433
-
Therefore, compared to an ARG, you can think of a standard tree sequence as simply
443
+
Therefore, compared to a full ARG, you can think of a simplified tree sequence as
434
444
storing the trees *created by* recombination events, rather than attempting to record the
435
445
recombination events themselves. The actual recombination events can be sometimes be
436
446
inferred from these trees but, as we have seen, it's not always possible. Here's another
437
447
way to put it:
438
448
439
449
> "an ARG encodes the events that occurred in the history of a sample,
440
-
> whereas a tree sequence encodes the outcome of those events"
450
+
> whereas a [simplified]tree sequence encodes the outcome of those events"
0 commit comments