Merge pull request #26 from Sherlock113/docs/kv-cache-offloading

Sherlock113 · web-flow · commit b05df634cf21 · 2025-07-31T11:03:54.000+08:00
docs: Fix link
diff --git a/docs/inference-optimization/prefix-caching.md b/docs/inference-optimization/prefix-caching.md
@@ -72,7 +72,7 @@ In agent workflows, the benefit is even more pronounced. Some use cases have inp
 
 For applications with long, repetitive prompts, prefix caching can significantly reduce both latency and cost. Over time, however, your KV cache size can be quite large. GPU memory is finite, and storing long prefixes across many users can eat up space quickly. You’ll need cache eviction strategies or memory tiering.
 
-The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-caching-cache-aware-routing) for details.
+The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-aware-routing) for details.
 
 ---