[AMD] Remove the deprecated C10_WARP_SIZE #9356

hubertlu-tw · 2025-08-19T15:55:11Z

Motivation

C10_WARP_SIZE is deprecated in newer pytorch versions.

Modifications

Use gfx arch to determine warp_size for ROCm.

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

CC: @sogalin @HaiShaw @zhyncs

gemini-code-assist

Summary of Changes

Hello @hubertlu-tw, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the WARP_SIZE definition within the sgl-kernel to remove the deprecated C10_WARP_SIZE macro from PyTorch. The change ensures compatibility with newer PyTorch versions by dynamically determining the warp_size for ROCm based on the graphics architecture, specifically checking for __GFX9__ to set it to 64, otherwise defaulting to 32. This update ensures proper warp_size configuration for different ROCm architectures.

Highlights

Deprecation Removal: Replaced the deprecated C10_WARP_SIZE macro, which was previously used for determining WARP_SIZE in ROCm environments.
ROCm Warp Size Logic Update: Introduced conditional compilation to determine WARP_SIZE for ROCm. If __GFX9__ is defined or __HIP_DEVICE_COMPILE__ is not defined, WARP_SIZE is set to 64; otherwise, it defaults to 32.
Header Clean-up: Removed unnecessary includes for ATen/cuda/CUDAContext.h and c10/macros/Macros.h as they are no longer needed for WARP_SIZE determination.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request removes the deprecated C10_WARP_SIZE and updates the logic for determining WARP_SIZE for ROCm builds. While removing the deprecated macro is a good improvement, the new logic sets WARP_SIZE to 64 for GFX9+ architectures, which is not compatible with the existing warpReduceMax function. This function is hardcoded for a 32-wide warp, and using it with a 64-wide warp will lead to incorrect calculations. I've identified this as a critical issue and provided a suggestion for fixing the reduction function.

gemini-code-assist · 2025-08-19T15:57:11Z

sgl-kernel/include/utils.h

+#if defined(__GFX9__) || !defined(__HIP_DEVICE_COMPILE__)
+#define WARP_SIZE 64
+#else
+#define WARP_SIZE 32
+#endif


This change correctly removes the deprecated C10_WARP_SIZE. However, by defining WARP_SIZE as 64 for GFX9+ architectures, it introduces a bug in the warpReduceMax function (lines 400-407), which is hardcoded for a warp size of 32.

With WARP_SIZE=64, warpReduceMax will only perform a partial reduction, leading to incorrect results. The blockReduceMax function (line 409) will also be affected as it relies on both warpReduceMax and WARP_SIZE.

To fix this, warpReduceMax must be updated to handle a 64-wide warp. Here is a suggested implementation:

__device__ __forceinline__ float warpReduceMax(float value) { #if WARP_SIZE == 64 value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 32)); #endif value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 16)); value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 8)); value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 4)); value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 2)); value = fmaxf(value, __shfl_xor_sync(FULL_MASK, value, 1)); return value; }

Remove the deprecated C10_WARP_SIZE

6756d0e

hubertlu-tw requested review from zhyncs, ispobock, HandH1998, BBuf, yizhang2077, merrymercy, FlamingoPg and HaiShaw as code owners August 19, 2025 15:55

gemini-code-assist bot reviewed Aug 19, 2025

View reviewed changes

hubertlu-tw changed the title ~~Remove the deprecated C10_WARP_SIZE~~ [AMD] Remove the deprecated C10_WARP_SIZE Aug 19, 2025

HaiShaw approved these changes Aug 20, 2025

View reviewed changes

Merge branch 'main' into sgl_kernel

d57baa6

zhyncs self-assigned this Aug 20, 2025

zhyncs merged commit 704ced1 into sgl-project:main Aug 22, 2025
91 of 93 checks passed

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Aug 22, 2025

[AMD] Remove the deprecated C10_WARP_SIZE (sgl-project#9356)

5ae4f7a

xjpang pushed a commit to xjpang/sglang that referenced this pull request Aug 22, 2025

[AMD] Remove the deprecated C10_WARP_SIZE (sgl-project#9356)

611ac01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Remove the deprecated C10_WARP_SIZE #9356

[AMD] Remove the deprecated C10_WARP_SIZE #9356

hubertlu-tw commented Aug 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

[AMD] Remove the deprecated C10_WARP_SIZE #9356

[AMD] Remove the deprecated C10_WARP_SIZE #9356

Conversation

hubertlu-tw commented Aug 19, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!