Question about memory management for custom cuda.parallel operators #4724
-
I've been working with the cuda.parallel module and noticed some memory usage patterns that I'm trying to better understand. When implementing custom operators: What's the recommended approach for managing temporary GPU memory allocations in custom operators to avoid memory leaks? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi @KalyanChakravarthyKodela - thanks for your question. If possible it would be helpful to see an example of kind of operations you're doing that lead to memory leaks. Are you able to share a representative code snippet/example? |
Beta Was this translation helpful? Give feedback.
-
Hi @shwina, thanks for your response!
|
Beta Was this translation helpful? Give feedback.
-
RAII based allocation on host is the best practice. Be aware that See https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/ and https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-2/ for more details. |
Beta Was this translation helpful? Give feedback.
RAII based allocation on host is the best practice. Be aware that
cudaFree
command may perform implicit synchronization. Consider using faster stream-ordered allocation API:cudaMallocAsync
/cudaFreeAsync
.See https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/ and https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-2/ for more details.