Skip to content

Commit 1956970

Browse files
authored
Merge pull request #424 from nspope/docs-cite-singer
Make sure to cite SINGER in docs for rescaling idea
2 parents 5b4c8d8 + db20d23 commit 1956970

File tree

1 file changed

+19
-11
lines changed

1 file changed

+19
-11
lines changed

docs/methods.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Pros
6868
: Old nodes do not suffer from time-discretisation issues caused by forcing
6969
bounds on the oldest times
7070
: Iterative updating properly accounts for cycles in the genealogy
71-
: No need to specify prior times
71+
: No need to specify node-specific priors; a mixture "prior" (fit by expectation-maximization) is used to regularise the roots.
7272
: Can account for variable population sizes using rescaling
7373

7474
Cons
@@ -118,15 +118,23 @@ ts = tsdate.date(input_ts, mutation_rate=1e-8, progress=True)
118118
(sec_rescaling)=
119119
#### Rescaling
120120

121-
During each EP step, the `variational_gamma` method implements a further process
122-
that we call *rescaling*, and which can help to deal with the effects of variable population
123-
size though time. Basically, time is broken up into a number of intervals, and times within
124-
intervals are simultaneously scaled such that the expected density of mutations along each
125-
path from a sample to the root best matches the mutational density predicted from the
126-
user-provided mutation rate. The number of intervals can be specified using the
127-
`rescaling_intervals` parameter. If set to 0, no rescaling is performed; this means that
128-
dates may be inaccurately estimated if the dataset comes from a set of samples with a complex
129-
demographic history.
121+
During each EP step, the `variational_gamma` method implements a further
122+
process called *rescaling*, and which can help to deal with the effects of
123+
variable population size though time. This is based on an algorithm introduced
124+
by the ARG inference software
125+
[SINGER](https://doi.org/10.1101/2024.03.16.585351) (Deng et al 2024) that
126+
rescales node ages by matching observed and expected segregating sites within
127+
time windows.
128+
Basically, time is broken up into a number of intervals, and times within
129+
intervals are simultaneously scaled such that the expected density of mutations
130+
along each path from a sample to the root best matches the mutational density
131+
predicted from the user-provided mutation rate. The number of intervals can be
132+
specified using the `rescaling_intervals` parameter. If set to 0, no rescaling
133+
is performed; this means that dates may be inaccurately estimated if the
134+
dataset comes from a set of samples with a complex demographic history.
135+
`tsdate` uses a modified version of Deng et al's algorithm that works on gamma
136+
natural parameters rather than point estimates, and that is not biased by the
137+
artefactual polytomies introduced by `tsinfer` for the sake of compression.
130138

131139
TODO: describe the rescaling step in more detail. Could also link to [the population size docs](sec_popsize)
132140

@@ -174,4 +182,4 @@ have no mapped mutations (e.g. in the centromere), which can be removed by
174182

175183
The `maximization` approach is slightly less accurate empirically,
176184
and will not return true posteriors, but is theoretically robust and
177-
additionally is always numerically stable.
185+
additionally is always numerically stable.

0 commit comments

Comments
 (0)