Adding Qwen3 and Qwen3MoE #36878

bozheng-hit · 2025-03-21T09:35:06Z

Adding Qwen3

This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker

github-actions · 2025-03-21T09:35:19Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

Swipe4057 · 2025-03-21T15:25:39Z

Will Qwen3 be implemented in Sglang?
https://github.com/sgl-project/sglang

ArthurZucker · 2025-03-21T17:14:30Z

@Swipe4057 we are working on transformers backend for SGLANG! So should come quickly 😉

ArthurZucker

HUGE 🚀 🚀 🚀 🚀 🚀 🚀 🚀

Super small comments:

for the moe inheriting from Mixtral or QwenMoe to get the forward will be "simpler"
attention paradigm inheriting from Olmo2!
just a question on max sliding window to be enforced or not!

Missing:

qenw3.md
qwen3_moe.md

That's it!

src/transformers/models/qwen3/modular_qwen3.py

src/transformers/models/qwen3_moe/modular_qwen3_moe.py

src/transformers/models/qwen3/modeling_qwen3.py

src/transformers/models/qwen3_moe/modular_qwen3_moe.py

ArthurZucker

A last nit (either rename keys to explicit the difference or we just use the class that already exists! Happy to merge if you want 🤗

src/transformers/models/qwen3/modular_qwen3.py

ArthurZucker · 2025-03-24T10:33:34Z

Merging from main should help with the ci!

HuggingFaceDocBuilderDev · 2025-03-24T11:02:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-03-24T12:32:44Z

ValueError: The following configurations don't contain any valid checkpoint:
Qwen3MoeConfig

The requirement is to include a link pointing to one of the models of this architecture in the docstring of the config classes listed above. The link should have be a markdown format like [myorg/mymodel](https://huggingface.co/myorg/mymodel).

this can be ignored if you want (the repo consistency check)

bozheng-hit · 2025-03-24T17:24:17Z

Merging from main should help with the ci!

After merging from the main branch, I noticed that some tests are still not passing as expected. Could you help take a look at the reasons? Or are these tests non-critical for our PR?

ArthurZucker · 2025-03-26T11:11:08Z

@bozheng-hit should be good, failing tests are unrelated. I simplified a little bit the modular!

Could you review and maybe update the readme, otherwise I can merge as is if it helps your release cycle!! 🤗

ArthurZucker · 2025-03-26T13:31:33Z

(Just waiting for your input to merge!)

bozheng-hit · 2025-03-27T17:35:07Z

(Just waiting for your input to merge!)

Hi, I will revert your changes to Qwen3MoE since the model cannot be loaded correctly after incorporating your modifications.

ArthurZucker · 2025-03-28T15:00:40Z

Mmm Okay let me do another pass to fix the tests / make sure my changes don't prevent loading!

ArthurZucker · 2025-03-28T15:06:34Z

BTW looping on the expert is not super optimal, at term we'll see what we can do to standardize this and support fast moe kernels

ArthurZucker · 2025-03-31T07:51:29Z

@bozheng-hit merged! Once you have an article or something we can also update the .md but not urgent! 🤗

bozheng-hit · 2025-03-31T10:31:59Z

@bozheng-hit merged! Once you have an article or something we can also update the .md but not urgent! 🤗

Thanks! We'll update the .md file to coincide with the official model launch! 🚀

* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

guangy10 · 2025-04-02T22:11:47Z

@bozheng-hit I want to say thank you for adding export support for the new Qwen3 model, making it ExecuTorch compatible in Day 1!

cc: @mergennachin @tugsbayasgalan @kimishpatel @cbilgin

* Enable Qwen3 and Qwen3-MOE for openvino huggingface/transformers#36878 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * add qwen3 test case * add simplified chat template for qwen3 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * update chat template * fix style * Update tests/openvino/test_modeling.py * update spda number --------- Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> Co-authored-by: Ella Charlaix <ella@huggingface.co> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

ydshieh · 2025-06-17T08:46:21Z

Hi @bozheng-hit

I am not able to find

Qwen/Qwen3-15B-A2B-Base

which is used in this PR code.

Is it Qwen/Qwen3-30B-A3B-Base instead?

* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

bozheng-hit and others added 3 commits March 17, 2025 15:09

Initial commit for Qwen3

08318b0

fix and add tests for qwen3 & qwen3_moe

85e55d4

Merge branch 'huggingface:main' into qwen3

3a726a3

github-actions bot marked this pull request as draft March 21, 2025 09:35

rename models for tests.

9c874f3

bozheng-hit marked this pull request as ready for review March 21, 2025 10:38

github-actions bot requested review from ArthurZucker and Rocketknight1 March 21, 2025 10:39

bozheng-hit added 2 commits March 21, 2025 18:43

fix

b5fadbf

fix

daf37ab

YamPengLi mentioned this pull request Mar 21, 2025

[Model] Add Qwen3 and Qwen3MoE vllm-project/vllm#15289

Merged

CISC mentioned this pull request Mar 21, 2025

llama: support Qwen3 ggml-org/llama.cpp#12501

Closed

2 tasks

ArthurZucker approved these changes Mar 21, 2025

View reviewed changes

eric-haibin-lin mentioned this pull request Mar 21, 2025

[roadmap] verl development Q2 volcengine/verl#710

Closed

33 tasks

lzhangzz mentioned this pull request Mar 22, 2025

Add Qwen3 and Qwen3MoE InternLM/lmdeploy#3305

Merged

yhyang201 mentioned this pull request Mar 23, 2025

[Model] Adding Qwen3 and Qwen3MoE sgl-project/sglang#4693

Merged

fix and add docs.

93e8e3e

bozheng-hit requested a review from ArthurZucker March 24, 2025 05:47

CUHKSZzxy mentioned this pull request Mar 24, 2025

[Feature] support qwen3 and qwen3-moe for pytorch engine InternLM/lmdeploy#3315

Merged

ArthurZucker approved these changes Mar 24, 2025

View reviewed changes

src/transformers/models/qwen3/modular_qwen3.py Show resolved Hide resolved

fix model name in docs.

81a0a34

ArthurZucker added the New model label Mar 24, 2025

Merge branch 'huggingface:main' into qwen3

2ad44d1

ArthurZucker added 3 commits March 26, 2025 11:58

Merge branch 'main' of github.com:huggingface/transformers into qwen3

e85ee51

Fix the red CI: ruff was updated

442d759

revert ruff, version was wrong

d583adf

fix qwen3moe.

bf1eca7

Datta0 mentioned this pull request Mar 28, 2025

[WIP] Initial support for Qwen3. Will udpate when the model is released unslothai/unsloth#2211

Merged

ArthurZucker added 5 commits March 28, 2025 16:07

fix

dcf931f

Merge branch 'main' of github.com:huggingface/transformers into qwen3

fbe9ee3

make sure MOE can load

b351bd1

Merge branch 'main' of github.com:huggingface/transformers into qwen3

7d7f8a0

fix copies

ad18808

ArthurZucker merged commit 6acd5ae into huggingface:main Mar 31, 2025
15 of 18 checks passed

zsLin177 mentioned this pull request Apr 17, 2025

support Qwen 3 and Qwen 3 MoE modelscope/ms-swift#3922

Closed

michaelfeil mentioned this pull request Apr 23, 2025

Support for the Qwen3 models in engine-flow NVIDIA/TensorRT-LLM#3813

Closed

This was referenced Apr 29, 2025

Export to ExecuTorch #32253

Open

Add support for Qwen3 huggingface/optimum-executorch#59

Merged

ydshieh mentioned this pull request Jun 17, 2025

Fix qwen3 tests #38862

Merged

Adding Qwen3 and Qwen3MoE #36878

Adding Qwen3 and Qwen3MoE #36878

Conversation

bozheng-hit commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding Qwen3

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

Swipe4057 commented Mar 21, 2025

Uh oh!

ArthurZucker commented Mar 21, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Mar 24, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 24, 2025

Uh oh!

ArthurZucker commented Mar 24, 2025

Uh oh!

bozheng-hit commented Mar 24, 2025

Uh oh!

ArthurZucker commented Mar 26, 2025

Uh oh!

ArthurZucker commented Mar 26, 2025

Uh oh!

bozheng-hit commented Mar 27, 2025

Uh oh!

ArthurZucker commented Mar 28, 2025

Uh oh!

ArthurZucker commented Mar 28, 2025

Uh oh!

Uh oh!

ArthurZucker commented Mar 31, 2025

Uh oh!

bozheng-hit commented Mar 31, 2025

Uh oh!

guangy10 commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jun 17, 2025

Uh oh!

Uh oh!

bozheng-hit commented Mar 21, 2025 •

edited

Loading

guangy10 commented Apr 2, 2025 •

edited

Loading