Skip to content

Commit b05df63

Browse files
authored
Merge pull request #26 from Sherlock113/docs/kv-cache-offloading
docs: Fix link
2 parents e32b6d9 + d5166f7 commit b05df63

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/inference-optimization/prefix-caching.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ In agent workflows, the benefit is even more pronounced. Some use cases have inp
7272

7373
For applications with long, repetitive prompts, prefix caching can significantly reduce both latency and cost. Over time, however, your KV cache size can be quite large. GPU memory is finite, and storing long prefixes across many users can eat up space quickly. You’ll need cache eviction strategies or memory tiering.
7474

75-
The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-caching-cache-aware-routing) for details.
75+
The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-aware-routing) for details.
7676

7777
---
7878

0 commit comments

Comments
 (0)