-
Notifications
You must be signed in to change notification settings - Fork 98
Fix ParILUT threshold select for larger matrices with GPU backends #1877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also change the tests so that we have the original issue? The tests you posted didn't show a large difference regarding their runtime.
Since the test is using |
I think what you describe, i.e. increasing the lower bound for the generator, is just fine. You could just have two different distributions, if you want to use less nnz for the other tests. |
847b4dc
to
be8d2bb
Compare
Figuring out what sort of combination of options would produce a matrix to trigger the bug (without resorting to a super large matrix) ended up being more challenging than I expected. I based the range of values on the nine point stencil example (which I originally modified to be my tester for finding/testing this bug) and ended up winging it a bit from there. But here is the output after 579e996 (testing on ICL guyot):
And after be8d2bb,
I also realized that I never added myself to the contributors list. Should I do that here? Or, looking at the list, perhaps it is time for a general update for emails/affiliations that have changed -- which could be a separate PR just for that? |
Yes, please add yourself to the contributors list. The rest should be updated in a separate PR. |
This PR addresses an issue that can occur with the
threshold_select
kernel of ParILUT, leading to non-deterministic behavior and different output than the CPU backends. It only happens when the matrix is large enough to end up inside thewhile (bucket.size > kernel::basecase_size)
loop, which is why it isn't triggered by any of the unit tests. However, if I adjust thetest_select
calls intest/factorization/par_ilut_kernels.cpp
to use a bigger matrix, I getwhen testing for CUDA.
After applying the changes in this branch:
I haven't tested the dpcpp version, but this part of the code is the same, so I updated it as well.
Note, this doesn't change the default behavior of ParILUT; it will only matter if the user turns off
approximate_select
.