-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Adding Qwen3 and Qwen3MoE #36878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Qwen3 and Qwen3MoE #36878
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
Will Qwen3 be implemented in Sglang? |
@Swipe4057 we are working on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HUGE 🚀 🚀 🚀 🚀 🚀 🚀 🚀
Super small comments:
- for the moe inheriting from Mixtral or QwenMoe to get the forward will be "simpler"
- attention paradigm inheriting from Olmo2!
- just a question on max sliding window to be enforced or not!
Missing:
- qenw3.md
- qwen3_moe.md
That's it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A last nit (either rename keys to explicit the difference or we just use the class that already exists! Happy to merge if you want 🤗
Merging from main should help with the ci! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ValueError: The following configurations don't contain any valid checkpoint:
Qwen3MoeConfig
The requirement is to include a link pointing to one of the models of this architecture in the docstring of the config classes listed above. The link should have be a markdown format like [myorg/mymodel](https://huggingface.co/myorg/mymodel). this can be ignored if you want (the repo consistency check) |
After merging from the main branch, I noticed that some tests are still not passing as expected. Could you help take a look at the reasons? Or are these tests non-critical for our PR? |
@bozheng-hit should be good, failing tests are unrelated. I simplified a little bit the modular! Could you review and maybe update the readme, otherwise I can merge as is if it helps your release cycle!! 🤗 |
(Just waiting for your input to merge!) |
Hi, I will revert your changes to Qwen3MoE since the model cannot be loaded correctly after incorporating your modifications. |
Mmm Okay let me do another pass to fix the tests / make sure my changes don't prevent loading! |
BTW looping on the expert is not super optimal, at term we'll see what we can do to standardize this and support fast moe kernels |
@bozheng-hit merged! Once you have an article or something we can also update the |
Thanks! We'll update the |
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
@bozheng-hit I want to say thank you for adding export support for the new Qwen3 model, making it ExecuTorch compatible in Day 1! |
* Enable Qwen3 and Qwen3-MOE for openvino huggingface/transformers#36878 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * add qwen3 test case * add simplified chat template for qwen3 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * update chat template * fix style * Update tests/openvino/test_modeling.py * update spda number --------- Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> Co-authored-by: Ella Charlaix <ella@huggingface.co> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Hi @bozheng-hit I am not able to find
which is used in this PR code. Is it |
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Adding Qwen3
This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker