-
Notifications
You must be signed in to change notification settings - Fork 596
[GCU] Enable gcu CI #3190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCU] Enable gcu CI #3190
Conversation
800027d
to
069474d
Compare
Thanks for your contribution! |
03f8018
to
00a565f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
f75d8eb
to
5d9aa0c
Compare
@@ -675,7 +675,7 @@ def initialize_attn_backend(self) -> None: | |||
) | |||
self.share_inputs["decoder_batch_ids"] = paddle.full([int(decode_max_tile_size)], 0, dtype="int32") | |||
self.share_inputs["decoder_tile_ids_per_batch"] = paddle.full([int(decode_max_tile_size)], 0, dtype="int32") | |||
self.share_inputs["decoder_num_blocks_cpu"] = paddle.full([1], 0, dtype="int32").pin_memory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么直接删掉了,删掉后变成一个gpu tensor 了。不使用pinned memory 的话也应该加上.cpu() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 tensor 最终会在 get_block_shape_and_split_kv_block kernel 中被用到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cce50d4
to
8395c3a
Compare
self.seq_lens_this_time_buffer[:num_running_requests].copy_( | ||
self.share_inputs["seq_lens_this_time"][:num_running_requests], False | ||
) | ||
self.seq_lens_this_time_buffer.copy_(self.share_inputs["seq_lens_this_time"], False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一块改动的原因是啥?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
这里主要是为了配合 第
300
行 的修改,更新完整的数据:
self.share_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer
-
暂时在
GCU
上没有采用real_bsz
的原因:
AttentionBackend
以及预处理后处理算子(update_inputs_gcu/set_value_by_flags_and_idx_gcu等)
使用了seq_lens_this_time
的shape
做了一些操作,应该需要统一整改。 -
real_bsz
这一改动对与GCU
可能带来的影响:
- 对调度系统,应该需要保证本次调度到的
num_running_requests
个请求集中排在整个task
列表的前面? - 对自定义算子等需要使用到
seq_lens_this_time
的地方有了约束的变更,需要排查整改?
81eef32
to
f4d1206
Compare
f4d1206
to
c10d2b5
Compare
使能
GCU CI