Skip to content

Commit d5166f7

Browse files
committed
Fix link
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
1 parent 5e2f6f5 commit d5166f7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/inference-optimization/prefix-caching.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ In agent workflows, the benefit is even more pronounced. Some use cases have inp
7272

7373
For applications with long, repetitive prompts, prefix caching can significantly reduce both latency and cost. Over time, however, your KV cache size can be quite large. GPU memory is finite, and storing long prefixes across many users can eat up space quickly. You’ll need cache eviction strategies or memory tiering.
7474

75-
The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-caching-cache-aware-routing) for details.
75+
The open-source community is actively working on distributed serving strategies. See [prefix-aware routing](./prefix-aware-routing) for details.
7676

7777
---
7878

0 commit comments

Comments
 (0)