Skip to content

Caching in pytest #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Caching in pytest #242

wants to merge 1 commit into from

Conversation

aryanbhosale
Copy link
Member

Pull Request

Description

@peterdudfield we will need to update the reusable workflow: openclimatefix/.github/.github/workflows/python-test.yml@issue/pip-all
to have something like this:

jobs:
  run-python-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - uses: actions/setup-python@v3
        with:
          python-version: ${{ inputs.python-version }}

      - name: Cache pip
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt', '**/pyproject.toml', '**/poetry.lock') }}
          restore-keys: ${{ runner.os }}-pip-

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run test command
        run: |
          pytest \
          --cov="${{ inputs.pytest_cov_dir }}" \
          --cov-report=xml:coverage.xml \
          ${{ inputs.test_dir }}

so that whenever we call that reusable workflow we will benefit from pip caching automatically.

Fixes #231

@peterdudfield
Copy link
Contributor

@zakwatts you ok to look at this?

@aryanbhosale aryanbhosale requested a review from zakwatts February 3, 2025 11:57
@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

Yes happy to review

python-version: 3.11

- name: Cache pip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the slow down is largely due to needed to download a model rather than pip caching.

Pip caching is currently included in workflow: https://github.com/openclimatefix/.github/blob/095e2489f12ba5269e4a6382b3db09d79c236ab6/.github/workflows/python-test.yml#L65

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

What we really want to cache is the model/data being downloaded

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

Looking into adding caching into the existing workflow: https://github.com/actions/cache

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

Think it might be easiest to copy the openclimatefix/.github workflow and then we can modify it in the Open Quartz repo so that we can set up some custom caching and keep it all in the same job run. Looking into this

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

I think the primary issue is downloading the v2 model models/v2/model_10_202405.ubj.zip so I'll try to cache this

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

The model is unzipped each time the test runs which i think takes a decent amount of time

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

image

image

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

image

@zakwatts
Copy link
Contributor

zakwatts commented Feb 4, 2025

The more detail on the tests is quite useful. Looks like the test forecast with no ts test is taking up a lot of time. Given it's a unit test, need to figure out why it takes so much longer in all tests than just unit tests.

@aryanbhosale
Copy link
Member Author

@zakwatts so from your review what I understand is that one of the unit tests is actually downloading and unpacking a large model on every run (models/v2/model_10_202405.ubj.zip) and simply caching pip will not help with that.

We effectively have two main approaches in that case,

  1. Mock out (or skip) the large‐model download during normal CI:
    (i) If this is truly a unit test, it should not be hitting a remote service or a giant model. Instead, so we could mock or stub out that call,
    (ii) If we really do want to integration‐test the large model, marking that test as “slow” or “integration” and only run it in a nightly/scheduled job, rather than on every PR sounds like a feasible option

  2. Cache the model in the Actions workflow: we'll have to be aware that github caches are limited to a couple of GBs/key, so if the model is extremely large we may need an alternative like using Hugging Face’s built‐in caching

What do you think @zakwatts @peterdudfield ?

@hrsvrn
Copy link
Contributor

hrsvrn commented Mar 21, 2025

@aryanbhosale did you try implementing caching of the huggingface model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI tests take 20 mins
4 participants