Skip to content

Use event to handle distributed row gatherer #1882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Jul 8, 2025

With the current distributed row gatherer , we does the following:
local row gatherer (prepare data for mpi) -> synchronize (it is required by mpi) -> submit mpi ialltoallv -> submit local spmv
Ginkgo will do local spmv and mpi ialltoallv.
However, the local spmv is forced to wait for synchronize() and then submit its kernel, which lead the gap between gpu activity.

Another approach is introduced in this pr to use event
local row gatherer -> record event -> submit local spmv -> wait the event -> submit mpi ialltoallv.
the submission local spmv does not need to wait for local row gatherer to finish, but it will give additional overhead during record event.

For those does not support AsyncEvent, we just synchronize when creating it.

something from the profiler
image

The local spmv starts earlier (yellow box for gpu activity and gree for cpu). The MPI communication starts later now (pink box). It gives better performance because local spmv still covers the mpi communication. If mpi communication is not covered by the local spmv, this will give slower performance. I will say it is okay-ish now. This situation means it is network bandwidth bound now, which is slow compared to other bandwidth.
We can improve it by introducing thread but it will lead another discussion whether we allow thread in ReferenceExecutor.

There are two approach might also help this situation - stream-aware mpi or async execution.
stream-aware mpi needs more study like using isend/irecv implement all_to_all_v + row gatherer.
async execution I have done something for async schwarz which touching some design question on executor, so it lead larger and longer discussion. Thus, I do not touch it here.

@yhmtsai yhmtsai requested a review from MarcelKoch July 8, 2025 14:41
@yhmtsai yhmtsai self-assigned this Jul 8, 2025
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. reg:benchmarking This is related to benchmarking. mod:all This touches all Ginkgo modules. labels Jul 8, 2025
@yhmtsai yhmtsai force-pushed the event_row_gatherer branch from 2daf422 to a3f291a Compare July 8, 2025 16:45
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Jul 10, 2025
@yhmtsai yhmtsai force-pushed the event_row_gatherer branch 2 times, most recently from 846d8d4 to d65428b Compare July 11, 2025 13:49
@yhmtsai yhmtsai force-pushed the event_row_gatherer branch from d65428b to 85dfc7b Compare August 14, 2025 22:19
@ginkgo-bot
Copy link
Member

Error: The following files need to be formatted:

omp/CMakeLists.txt

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system. reg:testing This is related to testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants