Skip to content

[Bug] AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer' when using --enable-hierarchical-cache #6005

@Simon-Li

Description

@Simon-Li

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Running the SGLang server with --enable-hierarchical-cache fails with an AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'. The error occurs within the cache loading thread (load_thread_func_layer_by_layer) when accessing the memory pool (memory_pool.py, line 955).

Bug Description:
The issue persists even when Data Parallelism is set to 1 (--dp=1). However, the server launches successfully if --enable-hierarchical-cache and related flags are removed, confirming the problem lies specifically with the hierarchical caching implementation.

Expected Behavior:
The SGLang server should run successfully without errors when hierarchical caching is enabled during the bench_multiturn executions.

Observed Behavior (Error Traceback):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
return self.kv_buffer[:, layer_id - self.start_layer, indices]
Traceback (most recent call last):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
self._target(*self._args, **self._kwargs)
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
flat_data = self.mem_pool_host.get_flat_data_by_layer(
File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
return self.kv_buffer[:, layer_id - self.start_layer, indices]
return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'

Additional Context:

  • Confirmed that running without --enable-hierarchical-cache (using only --tp=4) works correctly.
  • Confirmed the error occurs with both --dp=1 and --dp=2 when hierarchical cache is enabled.

Reproduction

  1. Check out SGLang commit: c5645e928f0bf989510dcd707d31249c63c57e37
  2. Have the Qwen3-14B model available (e.g., at ~/models/Qwen3-14B).
  3. Run the following command:
    python3 -m sglang.launch_server --model-path ~/models/Qwen3-14B --port 30000 \
                --enable-hierarchical-cache \
                --mem-fraction-static 0.8 \
                --hicache-ratio 2 \
                --enable-cache-report \
                --enable-metrics \
                --tp=4 \
                --dp=1
    
    python3 bench_multiturn.py --model-path ~/models/Qwen3-14B \
                --dataset-path 
                ~/models/ShareGPT_V3_unfiltered_cleaned_split/ShareGPT_V3_unfiltered_cleaned_split.json

Environment

SGLang Version/Commit: c5645e928f0bf989510dcd707d31249c63c57e37
Model: Qwen3-14B
PyTorch Version: 2.6.0
CUDA Version: 12.4
GPU Model: NVIDIA A10
Operating System: Ubuntu 22.04.5 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions