Skip to content

Conversation

manueldeprada
Copy link

We've recently merged a layer-wise refactor of the cache system in Transformers: huggingface/transformers#39106.

While testing your repo for compatibility, I had to adapt parts of the code to the new interface. To help with the migration, I've included my changes below. These are not intended as a full PR (I've only tested a small subset) but they should serve as a helpful guide.

Some updates are deprecations (e.g., cache.key_cache[i] is still supported via a backward-compatibility layer, though cache.layers[i].keys is preferred). However, there are also breaking changes, particularly in private attributes: for example, cache._quantized_key_cache is now cache.cache_processor._quantized_keys.

I also encountered some CUDA illegal memory access errors, which I suspect are related to: huggingface/transformers#39474 and contiguous memory requirements in FlashAttention v2.

In short, the upcoming Transformers release introduces necessary but potentially breaking changes that may impact this repo. I recommend testing against the main branch, and I'm happy to help if further issues come up.

@maxjeblick
Copy link
Collaborator

Thanks a lot for opening this PR, we really appreciate this proactive engagement!

We merged KvZipPress; this press would also require some updates. Would be great if you could update this PR.

Regarding the next steps:

@maxjeblick
Copy link
Collaborator

Hi @manueldeprada
As you may have noticed, the new refactoring of the attention implementation in transformers, alongside with some other changes, currently breaks kvpress.

As this is a larger topic, the maintainers of this repo are currently working on a fix for this.

@manueldeprada
Copy link
Author

Great! Please make sure to clone the main branch, we recently merged a further simplification of KV caches!:
huggingface/transformers#39797

Hopefully this is the final stable interface!!

This PR should provide enough inspiration to quickly adapt KVPress. Ping me if there are further pain points!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants