Skip to content

Conversation

fecet
Copy link

@fecet fecet commented Apr 4, 2025

Motivation

This PR addresses several critical issues in the current build scripts and configuration files for the sgl-kernel:

  • Ineffective Environment Variable Handling:
    Relying on environment variables to set CMake options is ineffective because these settings may not be correctly passed to CMake. Instead, using the -D flag ensures that options are explicitly defined and recognized by the build system.

  • Overly Permissive Feature Enabling:
    The current logic enables SGL_KERNEL_ENABLE_SM100A and SGL_KERNEL_ENABLE_SM90A based on an OR condition, which can inadvertently activate these features even when the CUDA version does not fully support them. By switching to an AND condition, the features are enabled only when the CUDA version is compatible and they are explicitly requested, leading to a more predictable and robust build process.

  • Premature CUDA Version Selection:
    The original scripts default to CUDA 12.8, which is currently unsupported by the latest PyTorch releases. This mismatch causes the scripts to fail at runtime. Adjusting the default CUDA version to 12.6 aligns with the supported versions and meets the requirements for integrating Flash Attention.

Modifications

  • Workflow Update:

    • Modified the GitHub Actions workflow to set the CUDA version to 12.6 instead of 12.8. This change ensures that the build environment is compatible with the supported PyTorch version and prevents runtime failures.
  • CMakeLists.txt Changes:

    • Revised the conditional checks by replacing logical OR (OR) with logical AND (AND). This ensures that features such as SM100A and SM90A are only enabled when both the CUDA version is sufficiently high and the feature is explicitly enabled.
  • build.sh Adjustments:

    • Updated the script to properly set the build flags (ENABLE_SM90A and BUILD_FA3) based on the CUDA version. For CUDA versions below 12, these features are explicitly disabled to prevent build errors.
    • Enhanced the handling of configuration options by passing them via CMake’s -D parameters instead of relying solely on environment variables.
  • New Script Addition:

Checklist

@fecet fecet changed the title Update sgl-kernel Build Scripts for CUDA Compatibility update sgl-kernel build scripts for CUDA compatibility Apr 4, 2025
@fecet fecet force-pushed the ci/cu12-build branch 4 times, most recently from 25efcbf to 8c0206e Compare April 5, 2025 03:55
@FlamingoPg
Copy link
Collaborator

nice work~

@FlamingoPg
Copy link
Collaborator

We need cu118 cu124 cu128 (for b200),can we support cu128?

@fecet fecet force-pushed the ci/cu12-build branch 3 times, most recently from 9bacb8d to 3faf0f6 Compare April 5, 2025 05:06
@fecet
Copy link
Author

fecet commented Apr 5, 2025

It currently works on cu118 and cu124 (and 126). For cu128, it seems more work is still needed.

@fecet
Copy link
Author

fecet commented Apr 5, 2025

Supporting cuda12.8 is a bit beyond the scope of this pull request. The goal of this PR is to fix the original CMakeLists.txt and build scripts.
Can this be merged first? I will study how to support cu128 in a new branch.

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

@fecet cu118 and cu124 should not be merged. cu124 is for pypi and cu118 is for https://github.com/sgl-project/whl


if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.8" OR SGL_KERNEL_ENABLE_SM100A)
if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.8" AND SGL_KERNEL_ENABLE_SM100A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the OR logic is correct. When building, we only need to consider the CUDA version. If it is equal to or greater than 12.8, we should include the sm100a feature. Alternatively, if the user specifies sm100a, we should also enable this feature as intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The building machine can be a GPU or CPU node. Please refer to this link: #5062 (comment)

@@ -124,8 +123,7 @@ else()
)
endif()

if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.4" OR SGL_KERNEL_ENABLE_SM90A)
set(BUILD_FA3 ON)
if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.4" AND SGL_KERNEL_ENABLE_SM90A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that CMakeLists.txt is only used by build.sh, then this logic would be acceptable. However, users might want to compile and package sglang according to their own requirements (I use conda to package sglang). In that case, the current logic is a bit odd. For example, even if the user explicitly disables the option, it will still be enabled if the CUDA version is high enough; or if the CUDA version is insufficient but the user enables the option, the build will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use conda to package sglang

ah it's interesting

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fecet
Copy link
Author

fecet commented Apr 5, 2025

The current script can actually work on cu128. There were some minor issues in my previous test.
log.txt

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

The current script can actually work on cu128. There were some minor issues in my previous test. log.txt

I think latest main can also work for cu128

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

ref #5074

@@ -190,7 +188,7 @@ set(SOURCES

# set flash-attention sources file
# BF16 source files
if (BUILD_FA3)
if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.3" AND BUILD_FA3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good @yinfan98

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

pytorch/manylinux-builder:cuda118
pytorch/manylinux-builder:cuda124
pytorch/manylinux2_28-builder:cuda128

FYI Currently we use these Docker images as base image to build wheel and currently latest main works well.

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

I think that you can keep if ("${CUDA_VERSION}" VERSION_GREATER_EQUAL "12.3" AND BUILD_FA3), other parts I need to consider for now.

@fecet
Copy link
Author

fecet commented Apr 5, 2025

I have reverted the logic to SGL_KERNEL_ENABLE_SM90A and SGL_KERNEL_ENABLE_SM100A, but this indeed undermines the significance of this PR. The current main branch relies on CMakeLists.txt to adjust the options, which leads to coupling between CMakeLists.txt and build.sh, and may confuse users regarding the build behavior.
(for example, the variable ENABLE_SM90A in the script is not actually used, and it is not set to ON or OFF, but rather is a constant 0).


CUDA_VERSION=$1

if awk "BEGIN {exit !($CUDA_VERSION >= 12.6 && $CUDA_VERSION < 12.8)}"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hebiao064 @qingquansong May you help verify on your cu126 env?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just saw it, working on it now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still the same issue from our machine: https://www.diffchecker.com/ROw0JSCD/

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
Tue Apr  8 21:01:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:04:00.0 Off |                    0 |
| N/A   31C    P0             77W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:23:00.0 Off |                    0 |
| N/A   30C    P0             76W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:43:00.0 Off |                    0 |
| N/A   32C    P0             78W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:64:00.0 Off |                    0 |
| N/A   27C    P0             74W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:84:00.0 Off |                    0 |
| N/A   31C    P0             76W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:A3:00.0 Off |                    0 |
| N/A   28C    P0             75W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:C3:00.0 Off |                    0 |
| N/A   30C    P0             76W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:E4:00.0 Off |                    0 |
| N/A   28C    P0             76W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it works on lmsys's image i think it should be fine, let's not blocking this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hebiao064, did you test on bare metal or in docker? Can you tell me your test command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we don't need this anymore. May you try the latest main with this fix #5245

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the relationship between them. #5245 should be to solve the error of importing after compilation, while the code here is to solve the problem of unable to compile under cu126. According to Dao-AILab/flash-attention#1453, this should be necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the latest main brach and get

2025-04-11 01:17:35 (10.4 MB/s) - ‘cmake-3.31.1-linux-x86_64.tar.gz’ saved [54989177/54989177]

PATH: /opt/cmake/bin:/usr/local/cuda/bin:/opt/rh/devtoolset-9/root/usr/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/opt/cmake/bin/cmake
cmake version 3.31.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).
Looking in indexes: https://download.pytorch.org/whl/cu126
ERROR: Could not find a version that satisfies the requirement torch==2.5.1 (from versions: none)
ERROR: No matching distribution found for torch==2.5.1

for build.sh 3.9 12.6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so. For cu126, you should also use manylinux 2.28

Copy link
Author

@fecet fecet Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also use torch==2.6.0, here is my result
log.txt

'[29/82] Building CUDA object CMakeFiles/flash_ops.dir/_deps/repo-flash-attention-src/hopper/instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu.o
FAILED: CMakeFiles/flash_ops.dir/_deps/repo-flash-attention-src/hopper/instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu.o 
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DFLASHATTENTION_DISABLE_BACKWARD -DFLASHATTENTION_DISABLE_DROPOUT -DFLASHATTENTION_DISABLE_SM8x -DFLASHATTENTION_DISABLE_UNEVEN_K -DFLASHATTENTION_VARLEN_ONLY -DPy_LIMITED_API=0x03090000 -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DU
SE_RPC -DUSE_TENSORPIPE -Dflash_ops_EXPORTS -I/sgl-kernel/include -I/sgl-kernel/csrc -I/sgl-kernel/build/_deps/repo-cutlass-src/include -I/sgl-kernel/build/_deps/repo-cutlass-src/tools/util/include -I/sgl-kernel/build/_deps/repo-flashinfer-src/include -I/sgl-kernel/build/_deps/repo-flashinfer-src/cs
rc -I/sgl-kernel/build/_deps/repo-flash-attention-src/hopper -isystem /opt/python/cp39-cp39/lib/python3.9/site-packages/torch/include -isystem /opt/python/cp39-cp39/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/python/cp39-cp39/include/python3.9 -isystem /usr/local/c
uda/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexpor
t_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC -DNDEBUG -DOPERATOR_NAMESPACE=sgl-kernel -O3 -Xcompiler -fPIC -gencode=arch=compute_90a,code=sm_90a -std=c++17 -DCUTE_USE_PACKED_TUPLE=1 -DCUTLASS_ENABLE_TENSOR_CORE_MMA
=1 -DCUTLASS_VERSIONS_GENERATED -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -D_GLIBCXX_USE_CXX11_ABI=1 -MD -MT CMakeFiles/flash_ops.
dir/_deps/repo-flash-attention-src/hopper/instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu.o -MF CMakeFiles/flash_ops.dir/_deps/repo-flash-attention-src/hopper/instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu.o.d -x cu -c /sgl-kernel/build/_deps/repo-flash-attention-src/hopper/
instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu -o CMakeFiles/flash_ops.dir/_deps/repo-flash-attention-src/hopper/instantiations/flash_fwd_hdimall_bf16_paged_softcap_sm90.cu.o
nvcc error   : 'ptxas' died due to signal 11 (Invalid memory reference)
nvcc error   : 'ptxas' core dumped

@zhyncs
Copy link
Member

zhyncs commented Apr 5, 2025

Overall great work! Thanks for your contribution. Once cu126 has been verified by @hebiao064 and @qingquansong. I'll merge it. Thanks!

@FlamingoPg
Copy link
Collaborator

Others LGTM

@FlamingoPg
Copy link
Collaborator

Overall great work! Thanks for your contribution. Once cu126 has been verified by @hebiao064 and @qingquansong. I'll merge it. Thanks!

I will try cu126 now

@ltm920716
Copy link

The current script can actually work on cu128. There were some minor issues in my previous test. log.txt

I think latest main can also work for cu128

hi @zhyncs
I run lmsysorg/sglang:latest (6af4cd9b0f09) on B200 with driver 570、cuda 12.8, the contaier gets exception:

WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
WARNING: Detected NVIDIA B200 GPU, which is not yet supported in this version of the container
ERROR: No supported GPU(s) detected to run this container

INFO 04-09 06:42:11 __init__.py:190] Automatically detected platform cuda.
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:235: UserWarning:
NVIDIA B200 with CUDA capability sm_100 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA B200 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

so do I need to build from source? and what files should change? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants