-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Hi, Dr. Ghysels,
I have seen some issues when using multi-GPU feature of STRUMPACK to solve a sparse matrix. I built STRUMPACK successfully with support of SLATE and MAGMA.
- When I run the test cases in STRUMPACK, "make test", the sparse_mpi and reuse_structure_mpi both failed.
# multifrontal factorization:
# - estimated memory usage (exact solver) = 0.178864 MB
# - minimum pivot, sqrt(eps)*|A|_1 = 1.05367e-08
# - replacing of small pivots is not enabled
CUDA assertion failed: invalid resource handle ~/STRUMPACK-v8.0.0/STRUMPACK-8.0.0/src/dense/CUDAWrapper.cu 114
[gpu01:2817703] *** Process received signal ***
However, it passes when I run with one GPU: "
OMP_NUM_THREADS=1 mpirun -n 1 test_structure_reuse_mpi pde900.mtx
- Random failure when solving a sparse matrix with STRUMPACK multi-gpu
Example: I try using 2 GPUs:
mpirun -n 2 --mca pml ucx myApplication.exe
a) sometimes it passes
OMP: Info #277: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
# DenseMPI factorization complete, GPU=1, P=2, T=10: 0.170223 seconds, 0.00550864 GFLOPS, 0.0323613 GFLOP/s, ds=203, du=0
(Why GPU =1 here? Does it mean, it only use one GPU but two processes are run on each og gpus I request? )
b) sometimes it fails with error msg
# multifrontal factorization:
# - estimated memory usage (exact solver) = 23.5596 MB
# - minimum pivot, sqrt(eps)*|A|_1 = 1.05367e-08
# - replacing of small pivots is not enabled
cuSOLVER assertion failed: 6 ~/STRUMPACK-v8.0.0/STRUMPACK-8.0.0/src/dense/CUDAWrapper.cpp 614
CUSOLVER_STATUS_EXECUTION_FAILED
Do you know what the reasons could be, causing these issues and how should I resolve them?
Best,
-Jing
Metadata
Metadata
Assignees
Labels
No labels