Skip to content

Implement state machine for tsk_variant_t and keep reference to tree_sequence in restricted_copy. #2436

@jeromekelleher

Description

@jeromekelleher

We are currently using the tree_sequence attribute of a way of determining whether a variant is a frozen copy or not. We also use the variant->site.position attribute as a way of determining if the variant has been decoded. It would be simpler if we had a single state machine, which supported transitions:

VARIANT_STATE_NEW -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_FROZEN_COPY
VARIANT_STATE_FROZEN_COPY -> VARIANT_STATE_FROZEN_COPY

Thus,

  • If state == VARIANT_STATE_NEW then tsk_variant_restricted_copy should fail
  • if state == VARIANT_STATE_FROZEN_COPY then tsk_variant_decode should fail

The current approach of using the tree_sequence is problematic because

  • We're documenting this attribute and are not documenting that it is currently NULL for frozen copies
  • We're still referring to memory from the tree sequence from the initial variant through the site copy (e.g., the list of mutations is still pointing to memory from the original ts). Thus, we still have a dependency on the original ts. Note, we're currently getting away with this dependence in the Python C API layer because we don't refer to the pointers within the site reference, but we could easily forget this some day.

I think we can also remove some complexity in tsk_variant_restricted_copy because we can then avoid taking copies of the alleles in the user_alleles memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C APIIssue is about the C APIPython APIIssue is about the Python APIfutureIssues that are closed as they are not planned in the medium-term, but which are still desirable.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions