-
Notifications
You must be signed in to change notification settings - Fork 129
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
We've noticed that when there's a mismatch between type of the lora_plugin
while building the engine and the type used for the storage-type
when calling hf_lora_convert
, the LoRa weights are not applied at all and we get the base model response, even by passing in the correct lora_task_id
. This happens without any warnings or errors, which makes it hard to know what the issue is.
Example:
trtllm-build \
--checkpoint_dir ${UNIFIED_CKPT_PATH} \
--output_dir ${ENGINE_PATH} \
--lora_plugin bfloat16
and
python3 tensorrt_llm/examples/hf_lora_convert.py -i ${ENGINE_PATH}/lora/0 -o tmp/lora_prefetch/1 --storage-type float16
will always lead to base model response during inference.
However, switching the build lora_plugin to either auto
or float16
returns the right response.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- run trt-llm build with
lora_plugin
andhf_lora_convert
with different dtypes
Expected behavior
Warning or error if LoRa doesn't work due to this mismatch
actual behavior
fails silently
additional notes
We used the llama3 example
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working