Skip to content

Make the WCS evaluation API uniform #561

@WilliamJamieson

Description

@WilliamJamieson

Overview

As described by #558 (comment), API for evaluating the WCS's transforms is not the same between different methods for evaluation such as .__call__, .invert, and .transform. Moreover, there is quite a bit of code
path duplication related to all these methods that should be streamlined.

Note that #559, #557, #445, #550 are likely related to this.

Goals

For GWCS 1.0 we should have the goals to:

  1. Have a well defined API for all the different methods for evaluating the GWCS.
  2. This API should attempt to be intuitive in its operation by default whenever possible. The output type should follow from the input type.
  3. The API should be as explicit as possible, especially when having it work in a no default fashion.
  4. Conform to the APE 14 specifications where necessary.
  5. Be well documented.

Background

All this being said the main crux of the issue follows from an "oversight" in APE 14's description of its interface. In APE 14, the notion of "high-level" and "low-level" objects is introduced, where

  • "low-level" objects are pure scalars and arrays (NumPy arrays) without any descriptive metadata. It is not entirely clear, if it allows more descriptive objects which can "act" like scalars and array, where in those interfaces will ignore any of this metadata.
  • "high-level" objects contain not only the data but also descriptive information. Effectively, this is describing things like astropy's Quantity and astropy.coordinates "coordinate" objects.

Additionally, APE 14 only discusses the "pixel" and "world" systems with no discussion of intermediates like GWCS does; meaning that, all it discusses is the pixel-to-world and world-to-pixel coordinate transforms. The "oversight" in APE 14 lies in the fact that it is implicitly assuming that there are no "high-level" objects for the "pixel" coordinates. Namely, it describes

In our terminology, pixel is a generalized concept of some form of ordered in-memory (or on-disk) storage of data.

Essentially, this means that APE 14 is assuming the pixel coordinates are just arrays whose index conveys all the metadata. In some sense, it is assuming that the pixel inputs are always in u.pix units. Though it is not explicitly stating this. The consequence of this is that APE 14 creates an ambiguity about what "high-level" means in the pixel coordinates.

This oversight, does not effect APE 14 itself because it is limited to only the "pixel" and "world" coordinates. This however creates an ambiguity for GWCS. Since GWCS is describing more than just pixel and world coordinates, namely "intermediate" coordinates, but is framing all the transforms through the APE 14 "pixel" and "world" coordinates. This means that transforms moving "forward" (towards the world) assume that Quantity objects inputs are "low-level" in a lot of ways, while at the same time assuming they are "high-level" objects as outputs. This means that for the non APE 14 methods like .__call__ and .invert, Quantity inputs are not treated the same which is not intuitive.

A partial attempt to resolve this issue was introduced to GWCS with the with_units key word argument, which is a simple boolean True/False. As this argument is implemented, it creates more issues than it solves. For example if one passes a Quantity to __call__, intuitively one might expect to get a Quantity back, but with_units defaults to False meaning that it in theory should not be returning a Quantity by default. However, as #558 has uncovered, this is not always the case. Moreover, if one then looks at .invert things are even more muddy. Essentially, with_units is doing more harm than good.

Moreover, there are code path issues involving .invert involving:

gwcs/gwcs/wcs/_wcs.py

Lines 302 to 303 in bfedbdf

if is_high_level(*args, low_level_wcs=self):
args = high_level_objects_to_values(*args, low_level_wcs=self)
while .__call__ does not. There are also duplicate code paths, and even more things muddying up behavior. Further analysis also leads to subtle inconsistencies arising in all this with respect to the APE 14 defined API.

Proposal

As @chris-simpson has pointed out in #558 (comment) the behavior in all these cases should not be different. I would also interpret some of the concerns raised in #558 as falling into the category of GWCS's API being "surprising" in its behavior.

Thus I propose that several actions be taken:

  1. Create specific, unambiguous, and easy to find documentation defining "high-level" and "low-level" objects for GWCS.
    • We should attempt to indicate this somehow in the API "type" listing in the docstrings/API docs.
    • For clear reasons, this is important for us to lay out as in some sense is what has caused the API issues to start with.
    • Possibly submit and update to APE 14 so that it is more clear about this as well?
  2. All GWCS "evaluation" API calls should try to funnel into a single code path rather than many duplicate ones. That is there are many cases of functions duplicating the same or similar code paths. This should be centralized into a single generalized "evaluation" method that the specific "evaluation" methods all create inputs for.
    • I propose

      gwcs/gwcs/wcs/_wcs.py

      Lines 1060 to 1082 in bfedbdf

      def transform(
      self,
      from_frame: str | CoordinateFrame,
      to_frame: str | CoordinateFrame,
      *args,
      with_units: bool = False,
      **kwargs,
      ):
      """
      Transform positions between two frames.
      Parameters
      ----------
      from_frame : str or `~gwcs.coordinate_frames.CoordinateFrame`
      Initial coordinate frame.
      to_frame : str, or instance of `~gwcs.coordinate_frames.CoordinateFrame`
      Coordinate frame into which to transform.
      args : float or array-like
      Inputs in ``from_frame``, separate inputs for each dimension.
      with_bounding_box : bool, optional
      If True(default) values in the result which correspond to any of
      the inputs being outside the bounding_box are set to ``fill_value``.
      """
      It is supposed to be the general case for evaluating the transform between any pair of pipeline steps.
    • By choosing this function any pre and post processing such as the high_level_objects_to_value calls can be centralized so that there is no behavioral deviations in other specific calls.
    • Basically, one expects GWCS's evaluation to act the same way no matter what "kind" of evaluation one is taking about.
  3. Remove (deprecate?) and replace the with_units key word boolean with an output_level key word string.
    • The term with_units is a bit misleading because it can in fact do more than just output Quantity objects, it basically casts to "high-level" objects that are referenced in the CoordinateFrame object for the target (such as "world") frame. Something like output_level more clearly describes the intent as controlling what you get out and references the notion "object" level woven throughout this discussion.
      • I am open to alternate names to output_level.
    • output_level should be a string literal rather than a boolean so that it is descriptive and has more than two options. I suggest:
      • "match" as the default. Meaning the "level" of the output matches the "level" of the input. Meaning if you pass a Quantity you expect to get the listed "high-level" object for the to_frame, which maybe a Quantity but could also be something like SkyCoord. Similarly, passing a pure array or scalar would result in a pure array or scalar.
      • "high", meaning the level of the output should be the "high-level" listed by the to_frame.
      • "low", meaning the level of the output should be a scalar or array.
    • By being a literal string, this leaves us the opportunity to extend this behavior easily if we come across more cases to account for.
    • By having "match" as the default we follow the "least surprising" idea.
  4. Add something like an .evaluate method that acts so that .evaluate(..., direction="forward") evaluates the full forward transform from the pixel frame to the world frame and .evaluate(..., direction="backward") evaluates the full backward transform from the world frame to the pixel frame.
    • For those wishing be explicit in their usage, this acts as a clear way to indicate how evaluation is occurring. It also indicates a symmetry in the API.
    • .__call__ and .invert should effectively be aliases of .evaluate(direction="forward") and .evaluate(direction="backward") respectively.
    • Default should be direction="forward, so this acts as a "match" to .invert in some sense.
  5. Review and update the APE 14 related API so that it conforms with APE 14 AND uses the same code paths as the rest of GWCS.
  6. Provide a detailed documentation of the internal implementation mechanisms. Much of this is about keep the internal implementation consistent. I don't want us to deviate from this by accident or misunderstanding of the code.

Additional Notes

Warning

It is important to note that despite its age, GWCS has still not made a stable release. This mean's its API and functionality is still evolving. While reasonable attempts will be made to avoid breaking changes with older versions, this may happen accidentally or deliberately. Our goal is to move towards a clean and consistent API that we can maintain in a stable version soon.

  • Hopefully, finalizing this work will be (one) of the last steps prior to a stable release.
  • Downstream users are strongly encouraged to provide their opinions and feedback towards the goals laid out here.
  • We recommend that downstream users test their code against the API changes and provide prompt feedback. We are more than willing to assist you if we break things and cannot easily find a way to provide a bridging method for backwards compatibility.
    • Indeed, we strongly encourage our downstream users to request (or submit a PR) adding their package to the GWCS downstream testing. "Real" use cases are extremely useful and this will help us avoid disruption to those users as much as possible.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions