|
68 | 68 | : Old nodes do not suffer from time-discretisation issues caused by forcing
|
69 | 69 | bounds on the oldest times
|
70 | 70 | : Iterative updating properly accounts for cycles in the genealogy
|
71 |
| -: No need to specify prior times |
| 71 | +: No need to specify node-specific priors; a mixture "prior" (fit by expectation-maximization) is used to regularise the roots. |
72 | 72 | : Can account for variable population sizes using rescaling
|
73 | 73 |
|
74 | 74 | Cons
|
@@ -118,15 +118,23 @@ ts = tsdate.date(input_ts, mutation_rate=1e-8, progress=True)
|
118 | 118 | (sec_rescaling)=
|
119 | 119 | #### Rescaling
|
120 | 120 |
|
121 |
| -During each EP step, the `variational_gamma` method implements a further process |
122 |
| -that we call *rescaling*, and which can help to deal with the effects of variable population |
123 |
| -size though time. Basically, time is broken up into a number of intervals, and times within |
124 |
| -intervals are simultaneously scaled such that the expected density of mutations along each |
125 |
| -path from a sample to the root best matches the mutational density predicted from the |
126 |
| -user-provided mutation rate. The number of intervals can be specified using the |
127 |
| -`rescaling_intervals` parameter. If set to 0, no rescaling is performed; this means that |
128 |
| -dates may be inaccurately estimated if the dataset comes from a set of samples with a complex |
129 |
| -demographic history. |
| 121 | +During each EP step, the `variational_gamma` method implements a further |
| 122 | +process called *rescaling*, and which can help to deal with the effects of |
| 123 | +variable population size though time. This is based on an algorithm introduced |
| 124 | +by the ARG inference software |
| 125 | +[SINGER](https://doi.org/10.1101/2024.03.16.585351) (Deng et al 2024) that |
| 126 | +rescales node ages by matching observed and expected segregating sites within |
| 127 | +time windows. |
| 128 | +Basically, time is broken up into a number of intervals, and times within |
| 129 | +intervals are simultaneously scaled such that the expected density of mutations |
| 130 | +along each path from a sample to the root best matches the mutational density |
| 131 | +predicted from the user-provided mutation rate. The number of intervals can be |
| 132 | +specified using the `rescaling_intervals` parameter. If set to 0, no rescaling |
| 133 | +is performed; this means that dates may be inaccurately estimated if the |
| 134 | +dataset comes from a set of samples with a complex demographic history. |
| 135 | +`tsdate` uses a modified version of Deng et al's algorithm that works on gamma |
| 136 | +natural parameters rather than point estimates, and that is not biased by the |
| 137 | +artefactual polytomies introduced by `tsinfer` for the sake of compression. |
130 | 138 |
|
131 | 139 | TODO: describe the rescaling step in more detail. Could also link to [the population size docs](sec_popsize)
|
132 | 140 |
|
@@ -174,4 +182,4 @@ have no mapped mutations (e.g. in the centromere), which can be removed by
|
174 | 182 |
|
175 | 183 | The `maximization` approach is slightly less accurate empirically,
|
176 | 184 | and will not return true posteriors, but is theoretically robust and
|
177 |
| -additionally is always numerically stable. |
| 185 | +additionally is always numerically stable. |
0 commit comments