-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Running the SGLang server with --enable-hierarchical-cache
fails with an AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
. The error occurs within the cache loading thread (load_thread_func_layer_by_layer
) when accessing the memory pool (memory_pool.py
, line 955).
Bug Description:
The issue persists even when Data Parallelism is set to 1 (--dp=1
). However, the server launches successfully if --enable-hierarchical-cache
and related flags are removed, confirming the problem lies specifically with the hierarchical caching implementation.
Expected Behavior:
The SGLang server should run successfully without errors when hierarchical caching is enabled during the bench_multiturn executions.
Observed Behavior (Error Traceback):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
return self.kv_buffer[:, layer_id - self.start_layer, indices]
Traceback (most recent call last):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
return self.kv_buffer[:, layer_id - self.start_layer, indices]
return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
Additional Context:
- Confirmed that running without
--enable-hierarchical-cache
(using only--tp=4
) works correctly. - Confirmed the error occurs with both
--dp=1
and--dp=2
when hierarchical cache is enabled.
Reproduction
- Check out SGLang commit:
c5645e928f0bf989510dcd707d31249c63c57e37
- Have the Qwen3-14B model available (e.g., at
~/models/Qwen3-14B
). - Run the following command:
python3 -m sglang.launch_server --model-path ~/models/Qwen3-14B --port 30000 \ --enable-hierarchical-cache \ --mem-fraction-static 0.8 \ --hicache-ratio 2 \ --enable-cache-report \ --enable-metrics \ --tp=4 \ --dp=1 python3 bench_multiturn.py --model-path ~/models/Qwen3-14B \ --dataset-path ~/models/ShareGPT_V3_unfiltered_cleaned_split/ShareGPT_V3_unfiltered_cleaned_split.json
Environment
SGLang Version/Commit: c5645e928f0bf989510dcd707d31249c63c57e37
Model: Qwen3-14B
PyTorch Version: 2.6.0
CUDA Version: 12.4
GPU Model: NVIDIA A10
Operating System: Ubuntu 22.04.5 LTS