-
Notifications
You must be signed in to change notification settings - Fork 129
Description
Hi Team, I am trying to follow the guide here to use mBART model on triton inference server - https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/encoder_decoder.md
and the output from my model is empty. On further debug, I realized that the example triton server configs provided here - https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/encoder_decoder.md#4-prepare-tritonserver-configs-
from tensorrt_llm/triton_backend/all_models/inflight_batcher_llm
doesn't actually use the encoder anywhere. The inputs to the tensorrt_llm
model are input_ids from the preprocessor
, neither preprocessor
, nor tensorrt_llm
use the parameter for encoder from config.pbtxt
.
Am I missing something?