[ROCm] Enable per token group quant fp8 in amd #3702

yiakwy-xpu-ml-framework-team · 2025-02-19T13:55:18Z

Motivation

This is follow up of PR#3664

Modifications

enable shlf_xor_sync in AMD platform
enable flashinfer vec_t in AMD platform (to be reomved once flashinfer-rocm [POC] ready)

ROCm test

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

BBuf · 2025-02-20T01:32:55Z

@HaiShaw Hi, can you have a look? Thanks.

HaiShaw

Preferably, code refactor is needed.
Also, some correctness to solve.

HaiShaw · 2025-02-24T07:55:52Z

sgl-kernel/benchmark/bench_per_token_group_quant_fp8.py

+    if is_hip_:
+        fp8_max = 224
+    else:
+        fp8_max = finfo.max


Can you make an FP8_E4M3_MAX global (outside of functions), and refer to it later?

Sorry for late reply. I have been working on MLA related function since yesterday.

Sure. I can put it inside "sglang.srt.utils" so that it comes with "_is_hip".

Does it sounds good ?

Also, can I make it in later PR, since this modification may be out of scope this PR? I will fix it as you suggested

Done.

FYI, #3959

sgl-kernel/benchmark/bench_per_token_group_quant_fp8.py

sgl-kernel/src/sgl-kernel/csrc/per_token_group_quant_fp8.cu

HaiShaw · 2025-02-24T10:04:28Z

sgl-kernel/src/sgl-kernel/csrc/per_token_group_quant_fp8.cu

 #include <flashinfer/vec_dtypes.cuh>
+#else
+#include "hip_vec_dtypes.h"


we should not boilerplate sgl-kernel code with flashinfer's.
better to make changes to flashinfer, and then use it.

Yes I agree. I have marked it as tempory solution as flashinfer-rocm is not fully supported and ready to use.

As far as I know, SGlang will continuously use flash::vec_t for vectorization of 128 bit data laoding. With this tempory support, we don't need to modify related CUDA codes.

Will it sound reasonable ?

HaiShaw · 2025-02-24T10:06:04Z

sgl-kernel/src/sgl-kernel/include/hip_vec_dtypes.h

+
+// Adapted from flashinfer
+
+#define FLASHINFER_INLINE inline __attribute__((always_inline)) __device__


no need to keep using FLASHINFER_INLINE here, it is very common macro.

Yes, it comes with flashinfer::vec_t tempory device functions support.

sgl-kernel/src/sgl-kernel/csrc/gemm/per_token_group_quant_fp8.cu

yiakwy-xpu-ml-framework-team · 2025-03-09T05:11:34Z

check list :

[ ] rebase, adapt after #3959 merged
[ ] make sure test_mla issue fixed and verify test_mla #4214

merrymercy · 2025-04-21T07:36:25Z

please fix the conflicts

yiakwy-xpu-ml-framework-team requested review from zhyncs, ispobock, HandH1998, BBuf, yizhang2077 and merrymercy as code owners February 19, 2025 13:55

HaiShaw self-requested a review February 23, 2025 03:54

HaiShaw requested changes Feb 24, 2025

View reviewed changes

yiakwy-xpu-ml-framework-team requested a review from HaiShaw February 25, 2025 11:48

yiakwy-xpu-ml-framework-team mentioned this pull request Feb 28, 2025

[tools] add fp8 max/min constant in utils #3959

Merged

6 tasks

yiakwy-xpu-ml-framework-team added 9 commits March 6, 2025 02:11

facilitate compiling in AMD platform

94cdce5

add castToFloat symbol for CUDA platform

f2e37a4

update setup_rocm.py

e904657

add support of flash vec_t in amd platform

90e93af

add flash vec_t in amd platform

3e3a1e4

apply clang format

2e5ea6e

verify FP8 precision in amd platform

3cb7454

cherry-pick verifier commit

07f67bc

fix amd fp8 numeric issue

e1ec0e8

yiakwy-xpu-ml-framework-team force-pushed the enable_per_token_group_quant_fp8_in_amd branch from c777940 to e1ec0e8 Compare March 6, 2025 08:15

yiakwy-xpu-ml-framework-team added 3 commits March 6, 2025 02:18

update setup_rocm.py

a0982fe

fix ci

cb8557a

fix source path

5291665

yiakwy-xpu-ml-framework-team mentioned this pull request Mar 6, 2025

[ROCm] Enable silu_and_mul, gelu_and_mul, gelu_tanh_and_mul in amd platform #4150

Closed

6 tasks

hebiao064 reviewed Mar 6, 2025

View reviewed changes

sgl-kernel/src/sgl-kernel/csrc/gemm/per_token_group_quant_fp8.cu Show resolved Hide resolved

BBuf mentioned this pull request Mar 7, 2025

[Feature] remove vllm _custom_ops #2965

Closed

18 tasks

yiakwy-xpu-ml-framework-team mentioned this pull request Mar 17, 2025

Support Online Quantization for W8A8 #4485

Merged

6 tasks

hubertlu-tw mentioned this pull request Apr 7, 2025

[AMD] Fix missing per_token_group_quant_fp8 for ROCm #5140

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Enable per token group quant fp8 in amd #3702

[ROCm] Enable per token group quant fp8 in amd #3702

Uh oh!

yiakwy-xpu-ml-framework-team commented Feb 19, 2025 •

edited

Loading

Uh oh!

BBuf commented Feb 20, 2025

Uh oh!

HaiShaw left a comment

Uh oh!

HaiShaw Feb 24, 2025

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading

Uh oh!

yiakwy-xpu-ml-framework-team Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HaiShaw Feb 24, 2025

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading

Uh oh!

HaiShaw Feb 24, 2025

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Mar 9, 2025

Uh oh!

merrymercy commented Apr 21, 2025

Uh oh!

Uh oh!


		// Adapted from flashinfer

		#define FLASHINFER_INLINE inline __attribute__((always_inline)) __device__

[ROCm] Enable per token group quant fp8 in amd #3702

Are you sure you want to change the base?

[ROCm] Enable per token group quant fp8 in amd #3702

Uh oh!

Conversation

yiakwy-xpu-ml-framework-team commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

ROCm test

Checklist

Uh oh!

BBuf commented Feb 20, 2025

Uh oh!

HaiShaw left a comment

Choose a reason for hiding this comment

Uh oh!

HaiShaw Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiakwy-xpu-ml-framework-team Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HaiShaw Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HaiShaw Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

yiakwy-xpu-ml-framework-team Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Mar 9, 2025

Uh oh!

merrymercy commented Apr 21, 2025

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Feb 19, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team Feb 25, 2025 •

edited

Loading