Skip to content

Conversation

yiakwy-xpu-ml-framework-team
Copy link
Contributor

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Feb 19, 2025

Motivation

This is follow up of PR#3664

Modifications

  • enable shlf_xor_sync in AMD platform
  • enable flashinfer vec_t in AMD platform (to be reomved once flashinfer-rocm [POC] ready)

ROCm test

截屏2025-03-06 15 57 47

Checklist

@BBuf
Copy link
Collaborator

BBuf commented Feb 20, 2025

@HaiShaw Hi, can you have a look? Thanks.

@HaiShaw HaiShaw self-requested a review February 23, 2025 03:54
Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably, code refactor is needed.
Also, some correctness to solve.

if is_hip_:
fp8_max = 224
else:
fp8_max = finfo.max
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make an FP8_E4M3_MAX global (outside of functions), and refer to it later?

Copy link
Contributor Author

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. I have been working on MLA related function since yesterday.

Sure. I can put it inside "sglang.srt.utils" so that it comes with "_is_hip".

Does it sounds good ?

Also, can I make it in later PR, since this modification may be out of scope this PR? I will fix it as you suggested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

FYI, #3959

#include <flashinfer/vec_dtypes.cuh>
#else
#include "hip_vec_dtypes.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not boilerplate sgl-kernel code with flashinfer's.
better to make changes to flashinfer, and then use it.

Copy link
Contributor Author

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. I have marked it as tempory solution as flashinfer-rocm is not fully supported and ready to use.

As far as I know, SGlang will continuously use flash::vec_t for vectorization of 128 bit data laoding. With this tempory support, we don't need to modify related CUDA codes.

Will it sound reasonable ?


// Adapted from flashinfer

#define FLASHINFER_INLINE inline __attribute__((always_inline)) __device__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to keep using FLASHINFER_INLINE here, it is very common macro.

Copy link
Contributor Author

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it comes with flashinfer::vec_t tempory device functions support.

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team force-pushed the enable_per_token_group_quant_fp8_in_amd branch from c777940 to e1ec0e8 Compare March 6, 2025 08:15
@BBuf BBuf mentioned this pull request Mar 7, 2025
18 tasks
@yiakwy-xpu-ml-framework-team
Copy link
Contributor Author

check list :

[ ] rebase, adapt after #3959 merged
[ ] make sure test_mla issue fixed and verify test_mla #4214

@merrymercy
Copy link
Contributor

please fix the conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants