Skip to content

LoRa weights not applied without warnings/errors when mismatch in type #791

@rahchuenmonroe

Description

@rahchuenmonroe

System Info

We've noticed that when there's a mismatch between type of the lora_plugin while building the engine and the type used for the storage-type when calling hf_lora_convert, the LoRa weights are not applied at all and we get the base model response, even by passing in the correct lora_task_id. This happens without any warnings or errors, which makes it hard to know what the issue is.

Example:

trtllm-build \
    --checkpoint_dir ${UNIFIED_CKPT_PATH} \
    --output_dir ${ENGINE_PATH} \
    --lora_plugin bfloat16 

and

python3 tensorrt_llm/examples/hf_lora_convert.py -i ${ENGINE_PATH}/lora/0 -o tmp/lora_prefetch/1 --storage-type float16

will always lead to base model response during inference.

However, switching the build lora_plugin to either auto or float16 returns the right response.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. run trt-llm build with lora_plugin and hf_lora_convert with different dtypes

Expected behavior

Warning or error if LoRa doesn't work due to this mismatch

actual behavior

fails silently

additional notes

We used the llama3 example

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions