-
Notifications
You must be signed in to change notification settings - Fork 596
[Feature] mm and thinking model support structred output #2749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[Feature] mm and thinking model support structred output #2749
Conversation
Thanks for your contribution! |
d07f737
to
72de4a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds structured output support via guided decoding (reasoning parsers) for multi-modal and thinking models, including offline inference capabilities.
- Introduce a new
--reasoning_parser
CLI argument and propagate it through configuration to model runners. - Extend the sampling and guided decoding pipeline: updated
Sampler
, guided backend interfaces, and skip-index logic. - Enhance
SamplingParams
withGuidedDecodingParams
and document offline inference usage for structured outputs.
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
fastdeploy/worker/worker_process.py | Add --reasoning_parser CLI arg and integrate it into FDConfig . |
fastdeploy/worker/vl_gpu_model_runner.py | Initialize guided backend and reasoning parser; update guided decoding flow in the GPU model runner. |
fastdeploy/model_executor/layers/sample/sampler.py | Enhance Sampler to support reasoning parsing and skip indices when masking tokens. |
fastdeploy/engine/sampling_params.py | Introduce GuidedDecodingParams in SamplingParams for offline structured inference. |
docs/features/structured_outputs.md | Add offline inference examples for structured output using GuidedDecodingParams . |
Comments suppressed due to low confidence (3)
fastdeploy/worker/vl_gpu_model_runner.py:145
- The code checks for
guided_json
,guided_regex
,guided_grammar
, andstructural_tag
but does not handleguided_choice
fromGuidedDecodingParams
. Add support forguided_choice
to ensure all constraint types are honored.
elif request.guided_grammar is not None:
fastdeploy/engine/engine.py:1049
- The code references
self.cfg.reasoning_parser
, butreasoning_parser
is not defined on the engine config object. It should likely referenceself.cfg.model_config.reasoning_parser
.
f" --reasoning_parser {self.cfg.reasoning_parser}")
fastdeploy/worker/vl_gpu_model_runner.py:152
- Using
request.get(...)
may not work ifrequest
is not a dict-like object. Consider usinggetattr(request, 'enable_thinking', True)
to access the attribute safely.
enable_thinking=request.get("enable_thinking", True),
aac8503
to
04c2f3c
Compare
2ef373a
to
69fc3a2
Compare
69fc3a2
to
6bd3676
Compare
0429910
to
3e9bba5
Compare
aec275d
to
278d3bd
Compare
Uh oh!
There was an error while loading. Please reload this page.