-
Notifications
You must be signed in to change notification settings - Fork 129
Open
Description
While following the multimodal workflow guide for Triton Server, I encountered an assertion error:
AssertionError: Vision preprocessor for preparing images before encoding is None
Upon investigation, I noticed that VisionPreProcessor is only initialized for mllama
, llava_onevision
, and qwen2_vl
:
Code Reference
However, 'llava'
is included in an earlier assertion confirming it as a supported model type. This mismatch causes a failure when running inference.
Proposed Fix:
I recommend adding a llava_process
method to VisionPreProcessor
, ensuring LLaVA models correctly initialize preprocessing when needed:
VisionPreProcessor class
Questions for Maintainers:
- Was LLaVA deliberately excluded from the vision preprocessing logic?
- Would extending
VisionPreProcessor
in this way be the best approach? - Are there other dependencies or configurations I should check before implementing this change?
Please advise on whether this approach aligns with your intended workflow. Thanks!
Metadata
Metadata
Assignees
Labels
No labels