Skip to content

Commit 3e5b76d

Browse files
bozheng-hitArthurZucker
authored andcommitted
Adding Qwen3 and Qwen3MoE (huggingface#36878)
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
1 parent 2c8b32f commit 3e5b76d

26 files changed

+5650
-3
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,10 @@
603603
title: Qwen2
604604
- local: model_doc/qwen2_moe
605605
title: Qwen2MoE
606+
- local: model_doc/qwen3
607+
title: Qwen3
608+
- local: model_doc/qwen3_moe
609+
title: Qwen3MoE
606610
- local: model_doc/rag
607611
title: RAG
608612
- local: model_doc/realm

docs/source/en/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,3 @@ Transformers is designed for developers and machine learning engineers and resea
4343
</a>
4444
</div>
4545

46-
Join us on the Hugging Face [Hub](https://huggingface.co/), [Discord](https://discord.com/invite/JfAtkvEtRb), or [forum](https://discuss.huggingface.co/) to collaborate and build models, datasets, and applications together.

docs/source/en/model_doc/qwen3.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
# Qwen3
18+
19+
## Overview
20+
21+
To be released with the official model launch.
22+
23+
### Model Details
24+
25+
To be released with the official model launch.
26+
27+
28+
## Usage tips
29+
30+
To be released with the official model launch.
31+
32+
## Qwen3Config
33+
34+
[[autodoc]] Qwen3Config
35+
36+
## Qwen3Model
37+
38+
[[autodoc]] Qwen3Model
39+
- forward
40+
41+
## Qwen3ForCausalLM
42+
43+
[[autodoc]] Qwen3ForCausalLM
44+
- forward
45+
46+
## Qwen3ForSequenceClassification
47+
48+
[[autodoc]] Qwen3ForSequenceClassification
49+
- forward
50+
51+
## Qwen3ForTokenClassification
52+
53+
[[autodoc]] Qwen3ForTokenClassification
54+
- forward
55+
56+
## Qwen3ForQuestionAnswering
57+
58+
[[autodoc]] Qwen3ForQuestionAnswering
59+
- forward

docs/source/en/model_doc/qwen3_moe.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
<!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
# Qwen3MoE
18+
19+
## Overview
20+
21+
To be released with the official model launch.
22+
23+
### Model Details
24+
25+
To be released with the official model launch.
26+
27+
## Usage tips
28+
29+
To be released with the official model launch.
30+
31+
## Qwen3MoeConfig
32+
33+
[[autodoc]] Qwen3MoeConfig
34+
35+
## Qwen3MoeModel
36+
37+
[[autodoc]] Qwen3MoeModel
38+
- forward
39+
40+
## Qwen3MoeForCausalLM
41+
42+
[[autodoc]] Qwen3MoeForCausalLM
43+
- forward
44+
45+
## Qwen3MoeForSequenceClassification
46+
47+
[[autodoc]] Qwen3MoeForSequenceClassification
48+
- forward
49+
50+
## Qwen3MoeForTokenClassification
51+
52+
[[autodoc]] Qwen3MoeForTokenClassification
53+
- forward
54+
55+
## Qwen3MoeForQuestionAnswering
56+
57+
[[autodoc]] Qwen3MoeForQuestionAnswering
58+
- forward

src/transformers/__init__.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -744,6 +744,8 @@
744744
"Qwen2VLConfig",
745745
"Qwen2VLProcessor",
746746
],
747+
"models.qwen3": ["Qwen3Config"],
748+
"models.qwen3_moe": ["Qwen3MoeConfig"],
747749
"models.rag": ["RagConfig", "RagRetriever", "RagTokenizer"],
748750
"models.recurrent_gemma": ["RecurrentGemmaConfig"],
749751
"models.reformer": ["ReformerConfig"],
@@ -3441,6 +3443,26 @@
34413443
"Qwen2VLPreTrainedModel",
34423444
]
34433445
)
3446+
_import_structure["models.qwen3"].extend(
3447+
[
3448+
"Qwen3ForCausalLM",
3449+
"Qwen3ForQuestionAnswering",
3450+
"Qwen3ForSequenceClassification",
3451+
"Qwen3ForTokenClassification",
3452+
"Qwen3Model",
3453+
"Qwen3PreTrainedModel",
3454+
]
3455+
)
3456+
_import_structure["models.qwen3_moe"].extend(
3457+
[
3458+
"Qwen3MoeForCausalLM",
3459+
"Qwen3MoeForQuestionAnswering",
3460+
"Qwen3MoeForSequenceClassification",
3461+
"Qwen3MoeForTokenClassification",
3462+
"Qwen3MoeModel",
3463+
"Qwen3MoePreTrainedModel",
3464+
]
3465+
)
34443466
_import_structure["models.rag"].extend(
34453467
[
34463468
"RagModel",
@@ -5993,6 +6015,8 @@
59936015
Qwen2VLConfig,
59946016
Qwen2VLProcessor,
59956017
)
6018+
from .models.qwen3 import Qwen3Config
6019+
from .models.qwen3_moe import Qwen3MoeConfig
59966020
from .models.rag import RagConfig, RagRetriever, RagTokenizer
59976021
from .models.recurrent_gemma import RecurrentGemmaConfig
59986022
from .models.reformer import ReformerConfig
@@ -8293,6 +8317,22 @@
82938317
Qwen2VLModel,
82948318
Qwen2VLPreTrainedModel,
82958319
)
8320+
from .models.qwen3 import (
8321+
Qwen3ForCausalLM,
8322+
Qwen3ForQuestionAnswering,
8323+
Qwen3ForSequenceClassification,
8324+
Qwen3ForTokenClassification,
8325+
Qwen3Model,
8326+
Qwen3PreTrainedModel,
8327+
)
8328+
from .models.qwen3_moe import (
8329+
Qwen3MoeForCausalLM,
8330+
Qwen3MoeForQuestionAnswering,
8331+
Qwen3MoeForSequenceClassification,
8332+
Qwen3MoeForTokenClassification,
8333+
Qwen3MoeModel,
8334+
Qwen3MoePreTrainedModel,
8335+
)
82968336
from .models.rag import (
82978337
RagModel,
82988338
RagPreTrainedModel,

src/transformers/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,8 @@
230230
qwen2_audio,
231231
qwen2_moe,
232232
qwen2_vl,
233+
qwen3,
234+
qwen3_moe,
233235
rag,
234236
recurrent_gemma,
235237
reformer,

src/transformers/models/auto/configuration_auto.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,8 @@
254254
("qwen2_audio_encoder", "Qwen2AudioEncoderConfig"),
255255
("qwen2_moe", "Qwen2MoeConfig"),
256256
("qwen2_vl", "Qwen2VLConfig"),
257+
("qwen3", "Qwen3Config"),
258+
("qwen3_moe", "Qwen3MoeConfig"),
257259
("rag", "RagConfig"),
258260
("realm", "RealmConfig"),
259261
("recurrent_gemma", "RecurrentGemmaConfig"),
@@ -609,6 +611,8 @@
609611
("qwen2_audio_encoder", "Qwen2AudioEncoder"),
610612
("qwen2_moe", "Qwen2MoE"),
611613
("qwen2_vl", "Qwen2VL"),
614+
("qwen3", "Qwen3"),
615+
("qwen3_moe", "Qwen3MoE"),
612616
("rag", "RAG"),
613617
("realm", "REALM"),
614618
("recurrent_gemma", "RecurrentGemma"),

src/transformers/models/auto/modeling_auto.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,8 @@
233233
("qwen2_audio_encoder", "Qwen2AudioEncoder"),
234234
("qwen2_moe", "Qwen2MoeModel"),
235235
("qwen2_vl", "Qwen2VLModel"),
236+
("qwen3", "Qwen3Model"),
237+
("qwen3_moe", "Qwen3MoeModel"),
236238
("recurrent_gemma", "RecurrentGemmaModel"),
237239
("reformer", "ReformerModel"),
238240
("regnet", "RegNetModel"),
@@ -576,6 +578,8 @@
576578
("qdqbert", "QDQBertLMHeadModel"),
577579
("qwen2", "Qwen2ForCausalLM"),
578580
("qwen2_moe", "Qwen2MoeForCausalLM"),
581+
("qwen3", "Qwen3ForCausalLM"),
582+
("qwen3_moe", "Qwen3MoeForCausalLM"),
579583
("recurrent_gemma", "RecurrentGemmaForCausalLM"),
580584
("reformer", "ReformerModelWithLMHead"),
581585
("rembert", "RemBertForCausalLM"),
@@ -1072,6 +1076,8 @@
10721076
("qdqbert", "QDQBertForSequenceClassification"),
10731077
("qwen2", "Qwen2ForSequenceClassification"),
10741078
("qwen2_moe", "Qwen2MoeForSequenceClassification"),
1079+
("qwen3", "Qwen3ForSequenceClassification"),
1080+
("qwen3_moe", "Qwen3MoeForSequenceClassification"),
10751081
("reformer", "ReformerForSequenceClassification"),
10761082
("rembert", "RemBertForSequenceClassification"),
10771083
("roberta", "RobertaForSequenceClassification"),
@@ -1153,6 +1159,8 @@
11531159
("qdqbert", "QDQBertForQuestionAnswering"),
11541160
("qwen2", "Qwen2ForQuestionAnswering"),
11551161
("qwen2_moe", "Qwen2MoeForQuestionAnswering"),
1162+
("qwen3", "Qwen3ForQuestionAnswering"),
1163+
("qwen3_moe", "Qwen3MoeForQuestionAnswering"),
11561164
("reformer", "ReformerForQuestionAnswering"),
11571165
("rembert", "RemBertForQuestionAnswering"),
11581166
("roberta", "RobertaForQuestionAnswering"),
@@ -1257,6 +1265,8 @@
12571265
("qdqbert", "QDQBertForTokenClassification"),
12581266
("qwen2", "Qwen2ForTokenClassification"),
12591267
("qwen2_moe", "Qwen2MoeForTokenClassification"),
1268+
("qwen3", "Qwen3ForTokenClassification"),
1269+
("qwen3_moe", "Qwen3MoeForTokenClassification"),
12601270
("rembert", "RemBertForTokenClassification"),
12611271
("roberta", "RobertaForTokenClassification"),
12621272
("roberta-prelayernorm", "RobertaPreLayerNormForTokenClassification"),

src/transformers/models/auto/tokenization_auto.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,20 @@
454454
),
455455
),
456456
("qwen2_vl", ("Qwen2Tokenizer", "Qwen2TokenizerFast" if is_tokenizers_available() else None)),
457+
(
458+
"qwen3",
459+
(
460+
"Qwen2Tokenizer",
461+
"Qwen2TokenizerFast" if is_tokenizers_available() else None,
462+
),
463+
),
464+
(
465+
"qwen3_moe",
466+
(
467+
"Qwen2Tokenizer",
468+
"Qwen2TokenizerFast" if is_tokenizers_available() else None,
469+
),
470+
),
457471
("rag", ("RagTokenizer", None)),
458472
("realm", ("RealmTokenizer", "RealmTokenizerFast" if is_tokenizers_available() else None)),
459473
(

src/transformers/models/qwen2_moe/configuration_qwen2_moe.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,7 @@ class Qwen2MoeConfig(PretrainedConfig):
2626
r"""
2727
This is the configuration class to store the configuration of a [`Qwen2MoeModel`]. It is used to instantiate a
2828
Qwen2MoE model according to the specified arguments, defining the model architecture. Instantiating a configuration
29-
with the defaults will yield a similar configuration to that of
30-
Qwen1.5-MoE-A2.7B" [Qwen/Qwen1.5-MoE-A2.7B"](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B").
29+
with the defaults will yield a similar configuration to that of [Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B).
3130
3231
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
3332
documentation from [`PretrainedConfig`] for more information.

0 commit comments

Comments
 (0)