Skip to content

Fix ParILUT threshold select for larger matrices with GPU backends #1877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

nbeams
Copy link
Collaborator

@nbeams nbeams commented Jul 2, 2025

This PR addresses an issue that can occur with the threshold_select kernel of ParILUT, leading to non-deterministic behavior and different output than the CPU backends. It only happens when the matrix is large enough to end up inside the while (bucket.size > kernel::basecase_size) loop, which is why it isn't triggered by any of the unit tests. However, if I adjust the test_select calls in test/factorization/par_ilut_kernels.cpp to use a bigger matrix, I get

[----------] Global test environment tear-down
[==========] 256 tests from 16 test suites ran. (156180 ms total)
[  PASSED  ] 240 tests.
[  FAILED  ] 16 tests, listed below:
[  FAILED  ] ParIlut/<gko::half, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::half,int>
[  FAILED  ] ParIlut/<gko::half, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::half,long>
[  FAILED  ] ParIlut/<gko::bfloat16, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::bfloat16,int>
[  FAILED  ] ParIlut/<gko::bfloat16, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::bfloat16,long>
[  FAILED  ] ParIlut/<double, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<double,int>
[  FAILED  ] ParIlut/<double, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<double,long>
[  FAILED  ] ParIlut/<float, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<float,int>
[  FAILED  ] ParIlut/<float, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<float,long>
[  FAILED  ] ParIlut/<std::complex<gko::half>, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<gko::half>,int>
[  FAILED  ] ParIlut/<std::complex<gko::half>, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<gko::half>,long>
[  FAILED  ] ParIlut/<std::complex<gko::bfloat16>, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<gko::bfloat16>,int>
[  FAILED  ] ParIlut/<std::complex<gko::bfloat16>, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<gko::bfloat16>,long>
[  FAILED  ] ParIlut/<std::complex<double>, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<double>,int>
[  FAILED  ] ParIlut/<std::complex<double>, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<double>,long>
[  FAILED  ] ParIlut/<std::complex<float>, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<float>,int>
[  FAILED  ] ParIlut/<std::complex<float>, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<std::complex<float>,long>

when testing for CUDA.

After applying the changes in this branch:

[----------] Global test environment tear-down
[==========] 256 tests from 16 test suites ran. (156397 ms total)
[  PASSED  ] 256 tests.

I haven't tested the dpcpp version, but this part of the code is the same, so I updated it as well.

Note, this doesn't change the default behavior of ParILUT; it will only matter if the user turns off approximate_select.

@nbeams nbeams added mod:cuda This is related to the CUDA module. 1:ST:ready-for-review This PR is ready for review mod:hip This is related to the HIP module. mod:dpcpp This is related to the DPC++ module. is:bugfix This fixes a bug labels Jul 2, 2025
@ginkgo-bot ginkgo-bot added the type:factorization This is related to the Factorizations label Jul 2, 2025
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also change the tests so that we have the original issue? The tests you posted didn't show a large difference regarding their runtime.

@nbeams
Copy link
Collaborator Author

nbeams commented Jul 3, 2025

Since the test is using gko::test::generate_random_lower_triangular_matrix, with std::uniform_int_distribution to determine the number of non-zeros per row, I believe that means we can't guarantee we will always get the same number of non-zeros when running the test on different systems, right? That's what I want -- a way to guarantee the final nnz is greater than a certain value. What would be the preferred way to handle that? Should I just make the bounds of the uniform_int_distribution the same so I can know exactly how many non-zeros my triangular matrix will have? And then only use this for the threshold select tests? Or is there a better way to do that within the Ginkgo testing framework?

@MarcelKoch
Copy link
Member

I think what you describe, i.e. increasing the lower bound for the generator, is just fine. You could just have two different distributions, if you want to use less nnz for the other tests.

@nbeams nbeams force-pushed the fix-threshold-select-gpu branch from 847b4dc to be8d2bb Compare July 16, 2025 22:44
@nbeams
Copy link
Collaborator Author

nbeams commented Jul 16, 2025

Figuring out what sort of combination of options would produce a matrix to trigger the bug (without resorting to a super large matrix) ended up being more challenging than I expected. I based the range of values on the nine point stencil example (which I originally modified to be my tester for finding/testing this bug) and ended up winging it a bit from there. But here is the output after 579e996 (testing on ICL guyot):

[----------] Global test environment tear-down
[==========] 256 tests from 16 test suites ran. (50639 ms total)
[  PASSED  ] 248 tests.
[  FAILED  ] 8 tests, listed below:
[  FAILED  ] ParIlut/<gko::half, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::half,int>
[  FAILED  ] ParIlut/<gko::half, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::half,long>
[  FAILED  ] ParIlut/<gko::bfloat16, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::bfloat16,int>
[  FAILED  ] ParIlut/<gko::bfloat16, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<gko::bfloat16,long>
[  FAILED  ] ParIlut/<double, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<double,int>
[  FAILED  ] ParIlut/<double, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<double,long>
[  FAILED  ] ParIlut/<float, int>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<float,int>
[  FAILED  ] ParIlut/<float, long>.KernelThresholdSelectIsEquivalentToRef, where TypeParam = std::tuple<float,long>

 8 FAILED TESTS

And after be8d2bb,

[----------] Global test environment tear-down
[==========] 256 tests from 16 test suites ran. (50456 ms total)
[  PASSED  ] 256 tests.

I also realized that I never added myself to the contributors list. Should I do that here? Or, looking at the list, perhaps it is time for a general update for emails/affiliations that have changed -- which could be a separate PR just for that?

@MarcelKoch
Copy link
Member

Yes, please add yourself to the contributors list. The rest should be updated in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review is:bugfix This fixes a bug mod:cuda This is related to the CUDA module. mod:dpcpp This is related to the DPC++ module. mod:hip This is related to the HIP module. type:factorization This is related to the Factorizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants