-
Notifications
You must be signed in to change notification settings - Fork 2.8k
vlm: tensor hash kernel #5974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vlm: tensor hash kernel #5974
Conversation
f460e17
to
9a214b5
Compare
@@ -222,7 +223,8 @@ def tensor_hash(tensor_list) -> int: | |||
for x in tensor_list | |||
] | |||
tensor = torch.concat(tensor_list) | |||
|
|||
if tensor.is_cuda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why will a tensor be on GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the fast version of the processor is enabled, the returned tensor will be on GPU be default, seehere
) | ||
|
||
# TODO: threads can't be synced on triton kernel | ||
final_hash = intermediate_hashes.sum().item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum
is not a good combinator for hash function
This is a very bad hash function! @yizhang2077 @zhyncs @mickqian |
related links: |
Motivation
Previously, for gpu-tensor, the hash process required by multimodal model will first:
With some simple profiling, hashing a normal image feature (e.g., from sgl-logo image, shape=[3312,1176], dtype=float32 in qwen-vl cases) would cost ~80ms.
Modifications
Profiling
Hash performance
MMMU
Correctness
Future Work
Checklist