Optimized Floyd-Warshall Algorithm (CPU + CUDA)

This project benchmarks three of my optimized implementations of the Floyd-Warshall all-pairs shortest paths algorithm for an 8-core CPU. The three versions are: naive serial, cache-friendly tiled, and OpenMP-parallelized with cache-friendly tiling. The naive serial version was used as the baseline benchmark for compparisons. My optimizations achieved over 6x speedup on an 8-core CPU. Also implemented a CUDA-accelerated GPU optimization of the algorithm using GPU global memory, massive block and thread-level parallelism. My GPU optimization achieved 20 times speedup over the naive serial implementation. All results and benchmarking were done across many graph sizes and all results verified using the standard naive implementation to ensure correctness. For more information on methods, results and background, read my paper by clicking this LINK.

Features

Serial implementation for baseline benchmarking and results verification
Cache-friendly blocked/tiled implementation for improved cache locality and performance on CPU
Blocked implementation with OpenMP for parallel CPU acceleration along with improved cache locality (6x speedup)
CUDA kernel with massive block and thread-level paralleism for GPU-accelerated performance (20x speedup)
Automatic timing across 9 graph sizes (4×4 to 3072×3072)
Output includes execution time printed in CSV format, and max error compared to the ground truth for result verification

Build Instructions

CPU Optimization: parallel_fw.c

gcc -O1 -fopenmp parallel_fw.c -lrt -o parallel_fw
OMP_NUM_THREADS=4 ./parallel_fw

GPU Optimization: parallel_fw_cuda.cu

    # USE THIS FOR K40m: (change accordingly for your GPU architecture)
    nvcc -arch=sm_35 parallel_fw_cuda.cu -o parallel_fw_cuda 
    Run: ./parallel_fw_cuda 

    # Use nvidia-smi to find relevant information on your GPU's architecture

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
parallel_fw.c		parallel_fw.c
parallel_fw_cuda.cu		parallel_fw_cuda.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimized Floyd-Warshall Algorithm (CPU + CUDA)

Features

Build Instructions

CPU Optimization: parallel_fw.c

GPU Optimization: parallel_fw_cuda.cu

About

Uh oh!

Releases

Packages

Languages

jedelist/optimized-floyd-warshall

Folders and files

Latest commit

History

Repository files navigation

Optimized Floyd-Warshall Algorithm (CPU + CUDA)

Features

Build Instructions

CPU Optimization: parallel_fw.c

GPU Optimization: parallel_fw_cuda.cu

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages