Skip to content

[GCU] Enable gcu CI #3190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 13, 2025
Merged

Conversation

EnflameGCU
Copy link
Contributor

使能GCU CI

Copy link

paddle-bot bot commented Aug 4, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 4, 2025
@EnflameGCU EnflameGCU force-pushed the enable_gcu_ci branch 20 times, most recently from 03f8018 to 00a565f Compare August 7, 2025 10:03
yongqiangma
yongqiangma previously approved these changes Aug 8, 2025
Copy link
Collaborator

@yongqiangma yongqiangma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EnflameGCU EnflameGCU force-pushed the enable_gcu_ci branch 5 times, most recently from f75d8eb to 5d9aa0c Compare August 8, 2025 08:05
yongqiangma
yongqiangma previously approved these changes Aug 11, 2025
@@ -675,7 +675,7 @@ def initialize_attn_backend(self) -> None:
)
self.share_inputs["decoder_batch_ids"] = paddle.full([int(decode_max_tile_size)], 0, dtype="int32")
self.share_inputs["decoder_tile_ids_per_batch"] = paddle.full([int(decode_max_tile_size)], 0, dtype="int32")
self.share_inputs["decoder_num_blocks_cpu"] = paddle.full([1], 0, dtype="int32").pin_memory()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么直接删掉了,删掉后变成一个gpu tensor 了。不使用pinned memory 的话也应该加上.cpu() ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 tensor 最终会在 get_block_shape_and_split_kv_block kernel 中被用到

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.seq_lens_this_time_buffer[:num_running_requests].copy_(
self.share_inputs["seq_lens_this_time"][:num_running_requests], False
)
self.seq_lens_this_time_buffer.copy_(self.share_inputs["seq_lens_this_time"], False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一块改动的原因是啥?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这里主要是为了配合 第300行 的修改,更新完整的数据:
    self.share_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer

  2. 暂时在GCU上没有采用real_bsz的原因:
    AttentionBackend以及预处理后处理算子(update_inputs_gcu/set_value_by_flags_and_idx_gcu等)使用了seq_lens_this_timeshape做了一些操作,应该需要统一整改。

  3. real_bsz这一改动对与GCU可能带来的影响:

  • 对调度系统,应该需要保证本次调度到的num_running_requests个请求集中排在整个task列表的前面?
  • 对自定义算子等需要使用到seq_lens_this_time的地方有了约束的变更,需要排查整改?

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit d1a92e3 into PaddlePaddle:develop Aug 13, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants