Skip to content

Conversation

yuhsuan-t
Copy link
Contributor

@yuhsuan-t yuhsuan-t commented Apr 2, 2025

Motivation

Torch profiling and nsys profiling are mutual exclusive. When setting up profiling through the /start_profile endpoint, the current activities option does not block users from setting both up. Doing so will cause CUPTI error.

Modifications

This PR adds a post init validation to make sure only one of the profiling is set.

Checklist

@yuhsuan-t
Copy link
Contributor Author

This is a fix to PR #4688. @fzyzcjy can you review? Thanks!

@@ -652,6 +652,14 @@ class ProfileReqInput:
num_steps: Optional[int] = None
activities: Optional[List[Literal["CPU", "GPU", "MEM", "CUDA_PROFILER"]]] = None

def __post_init__(self):
if "CUDA_PROFILER" in self.activities and len(self.activities) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What if self.activities is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -652,6 +652,14 @@ class ProfileReqInput:
num_steps: Optional[int] = None
activities: Optional[List[Literal["CPU", "GPU", "MEM", "CUDA_PROFILER"]]] = None

def __post_init__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be best to add a unit test to ensure this behavior never regresses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

@@ -652,6 +652,14 @@ class ProfileReqInput:
num_steps: Optional[int] = None
activities: Optional[List[Literal["CPU", "GPU", "MEM", "CUDA_PROFILER"]]] = None

def __post_init__(self):
if "CUDA_PROFILER" in self.activities and len(self.activities) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: len(self.activities)>1 may not consider future logic like adding more activities. Thus what about

enable_torch_profiler = ...is there CPU GPU MEM...
enable_nsys = ...is there CUDA_PROFILER...
if enable_nsys and enable_torch_profiler: err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change. Please review.

@yuhsuan-t yuhsuan-t force-pushed the yuhsuan-t/nsys_profiling_fix branch from 652e79a to a70a070 Compare April 3, 2025 20:43
@yuhsuan-t yuhsuan-t requested a review from zhyncs as a code owner April 3, 2025 20:43
@@ -30,6 +30,7 @@ class TestFile:
TestFile("test_ebnf_constrained.py"),
TestFile("test_fp8_kernel.py", 8),
TestFile("test_embedding_openai_server.py", 36),
TestFile("test_expert_distribution.py", 31),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was accidentally removed in this commit: 04e3ff6#diff-e5d098bede71ca21f8307897ae33646a035fe3bd1e0cff0aba99e7f8ac033a5fL33
Adding it back here.

@yuhsuan-t yuhsuan-t requested a review from fzyzcjy April 3, 2025 20:46
Comment on lines +658 to +666
nsys_profile = False
if "CUDA_PROFILER" in self.activities:
nsys_profile = True
torch_profile = False
if (
"CPU" in self.activities
or "GPU" in self.activities
or "MEM" in self.activities
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nsys_profile = "CUDA_PROFILER" in self.activities
torch_profile = (
            "CPU" in self.activities
            or "GPU" in self.activities
            or "MEM" in self.activities
        )

@@ -57,6 +58,7 @@ class TestFile:
TestFile("test_skip_tokenizer_init.py", 72),
TestFile("test_srt_engine.py", 237),
TestFile("test_srt_endpoint.py", 94),
TestFile("test_start_profile.py", 22),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this file seems to forget to be commited

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seems no this file :(

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I see, I will add it. Thanks!

Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants