Skip to content

A fairly large reworking to allow entire method objects to be returned #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 9, 2024
Merged
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,20 @@

## [0.2.2] - 2024-XX-XX

**Documentation**

- Various fixes in documentation, including documenting returned models.

**Breaking changes**

- The `return_posteriors` argument has been removed and replaced with `return_model`.
An instance of one of two previously internal classes, `ExpectationPropagationModel`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! But I have one very pedantic request-- can we not add "Model" to these class names, as they're not models but algorithms to fit models. We could prepend "Algo" or "Fit" instead, or leave as-is.

Copy link
Member Author

@hyanwong hyanwong Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also a little uncomfortable about "model", but I want to be able to say

ts, something = tsdate.date(ts, mu, return_something=True)

Have you got a better suggestion for "something". I guess as you say, "fit" might work actually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"return_fit" is fine, I think. I'd prefer to keep the class name as "ExpectationPropagation", no need to add a suffix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I completely see your point: "fit" is more appropriate and accurate than "model" (as well as it being shorter to type), so thanks for that suggestion.

The equivalent to the "ExpectationPropagation" class is now called "InsideOutside", but I'm not so happy with that name, partly because that fit object is returned from both the inside-outside and the outside-maximisation methods. Do you by any chance have a better suggestion for a name describing that non-variational approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Of course, it could be that it's possible to implement EP with the discrete-time method too, but let's cross that bridge later, if we ever come to it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you by any chance have a better suggestion for a name describing that non-variational approach?

BeliefPropagation?

and `InOutModel`, are now returned when `return_model=True`, and posteriors can
be obtained using `model.node_posteriors()`.

- Topology-only dating (setting `mutation_rate=None`) has been removed for tree sequences
of more than one tree, as tests have found that span-weighting the conditional coalescent
causes substantial bias.

## [0.2.1] - 2024-07-31

Expand Down
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Need to set PYTHONPATH so that we pick up the local tsdate
PYPATH=$(shell pwd)/../
TSDATE_VERSION:=$(shell PYTHONPATH=${PYPATH} \
python3 -c 'import tsdate; print(tsdate.__version__.split("+")[0])')
python -c 'import tsdate; print(tsdate.__version__.split("+")[0])')

BUILDDIR = _build

Expand All @@ -16,4 +16,4 @@ dist:
PYTHONPATH=${PYPATH} ./build.sh

clean:
rm -fR $(BUILDDIR)
rm -fR $(BUILDDIR)
1 change: 1 addition & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ html:
sphinx:
extra_extensions:
- sphinx.ext.autodoc
- sphinx.ext.mathjax
- sphinx.ext.autosummary
- sphinx.ext.todo
- sphinx.ext.viewcode
Expand Down
18 changes: 16 additions & 2 deletions docs/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,27 @@ This page provides formal documentation for the _tsdate_ Python API.
.. autofunction:: tsdate.maximization
```

## Underlying models

Instances of these models are returned by setting `return_model=True` when
dating. The models can be inspected to obtain more detailed results than
may be present in the returned tree sequence and its metadata.

```{eval-rst}
.. autoclass:: tsdate.discrete.InOutModel
:members:

.. autoclass:: tsdate.variational.ExpectationPropagationModel
:members:
```

## Prior and Time Discretisation Options

```{eval-rst}
.. autofunction:: tsdate.build_prior_grid
.. autofunction:: tsdate.build_parameter_grid
.. autoclass:: tsdate.base.NodeGridValues
.. autodata:: tsdate.base.DEFAULT_APPROX_PRIOR_SIZE
.. autoclass:: tsdate.node_time_class.NodeTimeValues
.. autodata:: tsdate.prior.DEFAULT_APPROX_PRIOR_SIZE
```

## Variable population sizes
Expand Down
21 changes: 9 additions & 12 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,23 +175,20 @@ running _tsdate_:
The metadata values (currently saved as `mn` and `vr`) need not be constrained by
the topology of the tree sequence, and should be used in preference
e.g. to `nodes_time` and `mutations_time` when evaluating the accuracy of _tsdate_.
2. The `return_posteriors` parameter can be used when calling {func}`tsdate.date`, which
then returns both the dated tree sequence and a dictionary specifying the posterior
distributions.

<!--
The returned posterior is a dictionary keyed by integer node ID, with values representing the
probability distribution of times. This can be read in to a [pandas](https://pandas.pydata.org)
dataframe:
2. The `return_model` parameter can be used when calling {func}`tsdate.date`, which
then returns both the dated tree sequence and a model. This model can then be queried
for the posterior distributions using e.g. `.node_posteriors()` which can be read in
to a [pandas](https://pandas.pydata.org) dataframe, as below:

```{code-cell} ipython3
import pandas as pd
redated_ts, posteriors = tsdate.date(
sim_ts, mutation_rate=1e-6, method="inside_outside", return_posteriors=True)
posteriors_df = pd.DataFrame(posteriors)
posteriors_df.head() # Show the dataframe
redated_ts, model = tsdate.date(
sim_ts, mutation_rate=1e-6, return_model=True)
posteriors_df = pd.DataFrame(model.node_posteriors())
posteriors_df.tail() # Show the dataframe
```

<!--
Since we are using a {ref}`sec_methods_discrete_time` method, each node
(numbered column of the dataframe) is associated with a vector of probabilities
that sum to one: each cell gives the probability that the time of the node
Expand Down
4 changes: 1 addition & 3 deletions tests/test_accuracy.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,7 @@ def test_basic(
assert sim_mutations_parameters["command"] == "sim_mutations"
mu = sim_mutations_parameters["rate"]

dts, posteriors = tsdate.inside_outside(
ts, population_size=Ne, mutation_rate=mu, return_posteriors=True
)
dts = tsdate.inside_outside(ts, population_size=Ne, mutation_rate=mu)
# make sure we can read node metadata - old tsdate versions didn't set a schema
if dts.table_metadata_schemas.node.schema is None:
tables = dts.dump_tables()
Expand Down
7 changes: 4 additions & 3 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,10 +232,9 @@ def test_no_output_variational_gamma(self, tmp_path, capfd):

@pytest.mark.parametrize(("flag", "log_status"), logging_flags.items())
def test_verbosity(self, tmp_path, caplog, flag, log_status):
popsize = 10000
ts = msprime.simulate(
10,
Ne=popsize,
Ne=10000,
mutation_rate=1e-8,
recombination_rate=1e-8,
length=2e4,
Expand All @@ -246,7 +245,9 @@ def test_verbosity(self, tmp_path, caplog, flag, log_status):
caplog.set_level(getattr(logging, log_status))
# either tsdate preprocess or tsdate date (in_out method has debug asserts)
self.run_tsdate_cli(tmp_path, ts, flag, cmd="preprocess")
self.run_tsdate_cli(tmp_path, ts, f"-n 10 --method inside_outside {flag}")
self.run_tsdate_cli(
tmp_path, ts, f"--mutation-rate 1e-8 --rescaling-intervals 0 {flag}"
)
assert log_status in caplog.text

@pytest.mark.parametrize(
Expand Down
Loading
Loading