Speedup shared expert weight construction by avoid cloning (sgl-project#5188)

fzyzcjy · tarinkk · commit 3c1ac5fa2908 · 2025-04-21T06:28:04.000Z
diff --git a/python/sglang/srt/models/deepseek_v2.py b/python/sglang/srt/models/deepseek_v2.py
@@ -1628,7 +1628,7 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
                                 f"mlp.experts."
                                 f"{self.config.n_routed_experts + num_repeat}"
                                 f".{suffix}",
-                                weights_dict[shared_expert_weight_name].clone(),
+                                weights_dict[shared_expert_weight_name],
                             )
                         )
                         names_to_remove += [shared_expert_weight_name]

Original file line number	Diff line number	Diff line change
`@@ -1628,7 +1628,7 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):`
`1628`	`1628`	`f"mlp.experts."`
`1629`	`1629`	`f"{self.config.n_routed_experts + num_repeat}"`
`1630`	`1630`	`f".{suffix}",`
`1631`		`- weights_dict[shared_expert_weight_name].clone(),`
	`1631`	`+ weights_dict[shared_expert_weight_name],`
`1632`	`1632`	`)`
`1633`	`1633`	`)`
`1634`	`1634`	`names_to_remove += [shared_expert_weight_name]`