[Bug] AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer' when using --enable-hierarchical-cache

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

Running the SGLang server with `--enable-hierarchical-cache` fails with an `AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'`. The error occurs within the cache loading thread (`load_thread_func_layer_by_layer`) when accessing the memory pool (`memory_pool.py`, line 955).

**Bug Description:**
The issue persists even when Data Parallelism is set to 1 (`--dp=1`). However, the server launches successfully if `--enable-hierarchical-cache` and related flags are removed, confirming the problem lies specifically with the hierarchical caching implementation.

**Expected Behavior:**
The SGLang server should run successfully without errors when hierarchical caching is enabled during the bench_multiturn executions.

**Observed Behavior (Error Traceback):**
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
    self._target(*self._args, **self._kwargs)
  File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
    flat_data = self.mem_pool_host.get_flat_data_by_layer(
  File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
    flat_data = self.mem_pool_host.get_flat_data_by_layer(
  File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
    return self.kv_buffer[:, layer_id - self.start_layer, indices]
Traceback (most recent call last):
Exception in thread Thread-13 (load_thread_func_layer_by_layer):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
    return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
    flat_data = self.mem_pool_host.get_flat_data_by_layer(
    self._target(*self._args, **self._kwargs)
  File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
  File "/root/sglang/python/sglang/srt/managers/cache_controller.py", line 338, in load_thread_func_layer_by_layer
    flat_data = self.mem_pool_host.get_flat_data_by_layer(
  File "/root/sglang/python/sglang/srt/mem_cache/memory_pool.py", line 955, in get_flat_data_by_layer
    return self.kv_buffer[:, layer_id - self.start_layer, indices]
    return self.kv_buffer[:, layer_id - self.start_layer, indices]
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'
AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer'

**Additional Context:**
* Confirmed that running without `--enable-hierarchical-cache` (using only `--tp=4`) works correctly.
* Confirmed the error occurs with both `--dp=1` and `--dp=2` when hierarchical cache is enabled.

### Reproduction
1.  Check out SGLang commit: `c5645e928f0bf989510dcd707d31249c63c57e37`
2.  Have the Qwen3-14B model available (e.g., at `~/models/Qwen3-14B`).
3.  Run the following command:
    ```bash
    python3 -m sglang.launch_server --model-path ~/models/Qwen3-14B --port 30000 \
                --enable-hierarchical-cache \
                --mem-fraction-static 0.8 \
                --hicache-ratio 2 \
                --enable-cache-report \
                --enable-metrics \
                --tp=4 \
                --dp=1

    python3 bench_multiturn.py --model-path ~/models/Qwen3-14B \
                --dataset-path 
                ~/models/ShareGPT_V3_unfiltered_cleaned_split/ShareGPT_V3_unfiltered_cleaned_split.json
    ```


### Environment
SGLang Version/Commit: `c5645e928f0bf989510dcd707d31249c63c57e37`
Model: Qwen3-14B
PyTorch Version: 2.6.0
CUDA Version: 12.4
GPU Model: NVIDIA A10
Operating System: Ubuntu 22.04.5 LTS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer' when using --enable-hierarchical-cache #6005

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] AttributeError: 'MHATokenToKVPoolHost' object has no attribute 'start_layer' when using --enable-hierarchical-cache #6005

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions