Update for new version of HF transformers. #104

manueldeprada · 2025-07-24T14:50:55Z

We've recently merged a layer-wise refactor of the cache system in Transformers: huggingface/transformers#39106.

While testing your repo for compatibility, I had to adapt parts of the code to the new interface. To help with the migration, I've included my changes below. These are not intended as a full PR (I've only tested a small subset) but they should serve as a helpful guide.

Some updates are deprecations (e.g., cache.key_cache[i] is still supported via a backward-compatibility layer, though cache.layers[i].keys is preferred). However, there are also breaking changes, particularly in private attributes: for example, cache._quantized_key_cache is now cache.cache_processor._quantized_keys.

I also encountered some CUDA illegal memory access errors, which I suspect are related to: huggingface/transformers#39474 and contiguous memory requirements in FlashAttention v2.

In short, the upcoming Transformers release introduces necessary but potentially breaking changes that may impact this repo. I recommend testing against the main branch, and I'm happy to help if further issues come up.

maxjeblick · 2025-07-25T08:53:31Z

Thanks a lot for opening this PR, we really appreciate this proactive engagement!

We merged KvZipPress; this press would also require some updates. Would be great if you could update this PR.

Regarding the next steps:

We will wait till transformers version 4.54 is released (it should contain [cache refactor] Move all the caching logic to a per-layer approach huggingface/transformers#39106).
Once out, we will review this PR (I expect it to be approved rather quickly). pyproject.toml would need to be updated to fix transformers >= 4.54
Once this PR has been merged, we will cut a new release to be compatible with transformers >= 4.54

maxjeblick · 2025-08-08T11:07:36Z

Hi @manueldeprada
As you may have noticed, the new refactoring of the attention implementation in transformers, alongside with some other changes, currently breaks kvpress.

As this is a larger topic, the maintainers of this repo are currently working on a fix for this.

manueldeprada · 2025-08-08T15:43:32Z

Great! Please make sure to clone the main branch, we recently merged a further simplification of KV caches!:
huggingface/transformers#39797

Hopefully this is the final stable interface!!

This PR should provide enough inspiration to quickly adapt KVPress. Ping me if there are further pain points!

manueldeprada added 2 commits July 23, 2025 11:06

Update base_press.py

09fbcbb

Update test_pipeline.py

1738a02

Update attention_patch.py

9986c31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update for new version of HF transformers. #104

Update for new version of HF transformers. #104

manueldeprada commented Jul 24, 2025

Uh oh!

maxjeblick commented Jul 25, 2025

Uh oh!

maxjeblick commented Aug 8, 2025

Uh oh!

manueldeprada commented Aug 8, 2025

Uh oh!

Uh oh!

Update for new version of HF transformers. #104

Are you sure you want to change the base?

Update for new version of HF transformers. #104

Conversation

manueldeprada commented Jul 24, 2025

Uh oh!

maxjeblick commented Jul 25, 2025

Uh oh!

maxjeblick commented Aug 8, 2025

Uh oh!

manueldeprada commented Aug 8, 2025

Uh oh!

Uh oh!