Skip to content

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Jul 29, 2025

Description

This PR adds the keyword argument keep_group_coordinates to temporal statistics preprocessors which controls how group coordinates (like year, day_of_year) are handled. Previously, each preprocessor handled this differently. Affected preprocessors:

  • hourly_statistics (default: keep_group_coordinates=False)
  • daily_statistics (default: keep_group_coordinates=False)
  • monthly_statistics (default: keep_group_coordinates=True)
  • seasonal_statistics (default: keep_group_coordinates=True)
  • annual_statistics (default: keep_group_coordinates=True)
  • decadal_statistics (default: keep_group_coordinates=True)

The different default values ensure full backwards-compatibility.

See ESMValGroup/ESMValTool#4129.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added this to the v2.13.0 milestone Jul 29, 2025
@schlunma schlunma requested a review from LisaBock July 29, 2025 14:40
@schlunma schlunma added the preprocessor Related to the preprocessor label Jul 29, 2025
Copy link

codecov bot commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.52%. Comparing base (357b8bd) to head (478a201).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2787      +/-   ##
==========================================
+ Coverage   95.45%   95.52%   +0.06%     
==========================================
  Files         260      260              
  Lines       15496    15508      +12     
==========================================
+ Hits        14792    14814      +22     
+ Misses        704      694      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@schlunma
Copy link
Contributor Author

To properly unify this, it would be nice to use the same default value for keep_group_coordinates in all preprocessors. However, this would not be fully backwards-compatible:

  • keep_group_coordinates=False: even though I would prefer this option, this would probably lead to many problems with subsequent code (e.g., diagnostics) that expect certain coordinates to be present. This would also mean that the default behavior of 4 of 6 preprocessors will change.
  • keep_group_coordinates=True: this should be relatively safe since the presence of coordinates is usually a much smaller problem than their absence. In this case, the default behavior of only 2 of 6 preprocessors would change.

@ESMValGroup/technical-lead-development-team opinions?

@schlunma schlunma removed the request for review from LisaBock July 30, 2025 13:55
@jlenh
Copy link
Contributor

jlenh commented Aug 18, 2025

To properly unify this, it would be nice to use the same default value for keep_group_coordinates in all preprocessors. However, this would not be fully backwards-compatible:

  • keep_group_coordinates=False: even though I would prefer this option, this would probably lead to many problems with subsequent code (e.g., diagnostics) that expect certain coordinates to be present. This would also mean that the default behavior of 4 of 6 preprocessors will change.
  • keep_group_coordinates=True: this should be relatively safe since the presence of coordinates is usually a much smaller problem than their absence. In this case, the default behavior of only 2 of 6 preprocessors would change.

@ESMValGroup/technical-lead-development-team opinions?

Not sure where we stand on this. Maybe it would require some further discussions at the next @ESMValGroup/technical-lead-development-team meeting before considering merging?

@bouweandela
Copy link
Member

even though I would prefer this option, this would probably lead to many problems with subsequent code (e.g., diagnostics) that expect certain coordinates to be present.

Could you do a grep through the recipes in ESMValTool so we can get an estimate of how many recipes would be affected?

@jlenh
Copy link
Contributor

jlenh commented Aug 19, 2025

even though I would prefer this option, this would probably lead to many problems with subsequent code (e.g., diagnostics) that expect certain coordinates to be present.

Could you do a grep through the recipes in ESMValTool so we can get an estimate of how many recipes would be affected?

Long story short, following Manuel's initial comment, switching by default to use keep_group_coordinates=True for all preprocessors for minimal changes would impact three cmorizers (WFDE5, ESA-CCI Cloud, ERA-Interim) and a recipe (ERA5 daily).

(Hopefully did not miss anything while copying)

  • hourly_statistics (default: keep_group_coordinates=False)
    • Did not come up in cmorizers and/or diagnostic scripts and/or recipes
  • daily_statistics (default: keep_group_coordinates=False)
  • monthly_statistics (default: keep_group_coordinates=True)
  • seasonal_statistics (default: keep_group_coordinates=True)
    • Diagnostic eddy growth rate script
    • Recipe monitor script
    • Recipe climate change hotspots script
  • annual_statistics (default: keep_group_coordinates=True)
  • decadal_statistics (default: keep_group_coordinates=True)
    • Recipe ocean biogeochemistry script
    • Recipe Tebaldi 2021 ESD script

@bouweandela
Copy link
Member

Thank you for the summary, and I apologize that my question was not clear. What I meant was that I agree that setting keep_group_coordinates=False by default would be the cleanest solution. To estimate the impact of that, I would like to know the number of recipes with a diagnostic script that actually uses the group coordinates.

CMORizers should always discard the group coordinates, as these are not part of the variable definition in the CMOR tables.

@jlenh
Copy link
Contributor

jlenh commented Aug 20, 2025

Thank you for the summary, and I apologize that my question was not clear. What I meant was that I agree that setting keep_group_coordinates=False by default would be the cleanest solution. To estimate the impact of that, I would like to know the number of recipes with a diagnostic script that actually uses the group coordinates.

CMORizers should always discard the group coordinates, as these are not part of the variable definition in the CMOR tables.

Aaaah I understand now! The number of recipes is greatly reduced it seems (hopefully I haven't missed any).
Here is the summary then for the group coordinates with by default keep_group_coordinates=True:

  • monthly_statistics (default: keep_group_coordinates=True)
    • Recipe monitor script
    • Recipe ENSO Basic climatology script
    • Recipe climate patterns script
  • seasonal_statistics (default: keep_group_coordinates=True)
  • annual_statistics (default: keep_group_coordinates=True)
    • Recipe calculate GWL exceedance script
  • decadal_statistics (default: keep_group_coordinates=True)

@jlenh jlenh modified the milestones: v2.13.0, v2.14.0 Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants