Skip to content

Conversation

bozheng-hit
Copy link
Contributor

@bozheng-hit bozheng-hit commented Mar 21, 2025

Adding Qwen3

This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker

@github-actions github-actions bot marked this pull request as draft March 21, 2025 09:35
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@bozheng-hit bozheng-hit marked this pull request as ready for review March 21, 2025 10:38
@Swipe4057
Copy link

Will Qwen3 be implemented in Sglang?
https://github.com/sgl-project/sglang

@ArthurZucker
Copy link
Collaborator

@Swipe4057 we are working on transformers backend for SGLANG! So should come quickly 😉

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUGE 🚀 🚀 🚀 🚀 🚀 🚀 🚀

Super small comments:

  • for the moe inheriting from Mixtral or QwenMoe to get the forward will be "simpler"
  • attention paradigm inheriting from Olmo2!
  • just a question on max sliding window to be enforced or not!

Missing:

  • qenw3.md
  • qwen3_moe.md

That's it!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A last nit (either rename keys to explicit the difference or we just use the class that already exists! Happy to merge if you want 🤗

@ArthurZucker
Copy link
Collaborator

Merging from main should help with the ci!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

ValueError: The following configurations don't contain any valid checkpoint:
Qwen3MoeConfig

The requirement is to include a link pointing to one of the models of this architecture in the docstring of the config classes listed above. The link should have be a markdown format like [myorg/mymodel](https://huggingface.co/myorg/mymodel).

this can be ignored if you want (the repo consistency check)

@bozheng-hit
Copy link
Contributor Author

Merging from main should help with the ci!

After merging from the main branch, I noticed that some tests are still not passing as expected. Could you help take a look at the reasons? Or are these tests non-critical for our PR?

@ArthurZucker
Copy link
Collaborator

@bozheng-hit should be good, failing tests are unrelated. I simplified a little bit the modular!

Could you review and maybe update the readme, otherwise I can merge as is if it helps your release cycle!! 🤗

@ArthurZucker
Copy link
Collaborator

(Just waiting for your input to merge!)

@bozheng-hit
Copy link
Contributor Author

(Just waiting for your input to merge!)

Hi, I will revert your changes to Qwen3MoE since the model cannot be loaded correctly after incorporating your modifications.

@ArthurZucker
Copy link
Collaborator

Mmm Okay let me do another pass to fix the tests / make sure my changes don't prevent loading!

@ArthurZucker
Copy link
Collaborator

BTW looping on the expert is not super optimal, at term we'll see what we can do to standardize this and support fast moe kernels

@ArthurZucker ArthurZucker merged commit 6acd5ae into huggingface:main Mar 31, 2025
15 of 18 checks passed
@ArthurZucker
Copy link
Collaborator

@bozheng-hit merged! Once you have an article or something we can also update the .md but not urgent! 🤗

@bozheng-hit
Copy link
Contributor Author

@bozheng-hit merged! Once you have an article or something we can also update the .md but not urgent! 🤗

Thanks! We'll update the .md file to coincide with the official model launch! 🚀

dmdaksh pushed a commit to dmdaksh/transformers that referenced this pull request Apr 2, 2025
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
zucchini-nlp pushed a commit to BakerBunker/transformers that referenced this pull request Apr 2, 2025
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
@guangy10
Copy link
Contributor

guangy10 commented Apr 2, 2025

@bozheng-hit I want to say thank you for adding export support for the new Qwen3 model, making it ExecuTorch compatible in Day 1!

cc: @mergennachin @tugsbayasgalan @kimishpatel @cbilgin

echarlaix added a commit to huggingface/optimum-intel that referenced this pull request Apr 29, 2025
* Enable Qwen3 and Qwen3-MOE for openvino

huggingface/transformers#36878

* Update optimum/exporters/openvino/model_configs.py

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* Update optimum/exporters/openvino/model_configs.py

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* Update optimum/exporters/openvino/model_configs.py

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* add qwen3 test case

* add simplified chat template for qwen3

* Update optimum/exporters/openvino/model_configs.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

* update chat template

* fix style

* Update tests/openvino/test_modeling.py

* update spda number

---------

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Co-authored-by: Ella Charlaix <ella@huggingface.co>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 17, 2025

Hi @bozheng-hit

I am not able to find

Qwen/Qwen3-15B-A2B-Base

which is used in this PR code.

Is it Qwen/Qwen3-30B-A3B-Base instead?

@ydshieh ydshieh mentioned this pull request Jun 17, 2025
soghomon-b pushed a commit to soghomon-b/transformers that referenced this pull request Aug 24, 2025
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants