Skip to content

Question about memory management for custom cuda.parallel operators #4724

Discussion options

You must be logged in to vote

RAII based allocation on host is the best practice. Be aware that cudaFree command may perform implicit synchronization. Consider using faster stream-ordered allocation API: cudaMallocAsync/cudaFreeAsync.

See https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/ and https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-2/ for more details.

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by KalyanChakravarthyK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
CUB
Labels
None yet
3 participants