Fix sglang rollout batch mismatch issue #1387

SwordFaith · 2025-05-04T16:39:45Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish.

Fix issue when using GRPO with tool_kwargs batch_size mismatch issue, follow junrong's practice in vllm_spmd rollout and #1385

close #1380

Usage Example

Provide usage example(s) for easier usage.

follow reproduction method in #1380

set -x

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_fsdp_comparison' \
    trainer.experiment_name='fsdp_1' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=5 \
    trainer.total_epochs=1 $@

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue Running GRPO example on latest commit yields tools_kwargs length mismatch #1380
Training: both
Inference: SGLang

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if neccessary.

eric-haibin-lin · 2025-05-04T21:19:56Z

verl/workers/rollout/sglang_rollout/sglang_rollout.py

@@ -320,6 +320,9 @@ def generate_sequences(self, prompts: DataProto, **kwargs) -> DataProto:
                batch_size = batch_size * self.sampling_params["n"]
                if "multi_modal_inputs" in non_tensor_batch.keys():
                    non_tensor_batch["multi_modal_inputs"] = np.repeat(non_tensor_batch["multi_modal_inputs"], self.sampling_params["n"], axis=0)


shall we add a test that reproduce this issue?

Fix sglang rollout batch mismatch issue

17eba09

SwordFaith mentioned this pull request May 4, 2025

Running GRPO example on latest commit yields tools_kwargs length mismatch #1380

Closed

eric-haibin-lin reviewed May 4, 2025

View reviewed changes

SwordFaith closed this May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix sglang rollout batch mismatch issue #1387

Fix sglang rollout batch mismatch issue #1387

Uh oh!

SwordFaith commented May 4, 2025

Uh oh!

eric-haibin-lin May 4, 2025

Uh oh!

Uh oh!

Fix sglang rollout batch mismatch issue #1387

Fix sglang rollout batch mismatch issue #1387

Uh oh!

Conversation

SwordFaith commented May 4, 2025

Checklist Before Starting

What does this PR do?

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

eric-haibin-lin May 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!