Skip to content

Contains the C and CUDA files that feature my optimizations of the Floyd-Warshall all-pairs shortest path algorithm for a CPU and GPU. The C and CUDA files contain all necessary helper functions to properly time, initialize, compare and graph the results.

Notifications You must be signed in to change notification settings

jedelist/optimized-floyd-warshall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Optimized Floyd-Warshall Algorithm (CPU + CUDA)

This project benchmarks three of my optimized implementations of the Floyd-Warshall all-pairs shortest paths algorithm for an 8-core CPU. The three versions are: naive serial, cache-friendly tiled, and OpenMP-parallelized with cache-friendly tiling. The naive serial version was used as the baseline benchmark for compparisons. My optimizations achieved over 6x speedup on an 8-core CPU. Also implemented a CUDA-accelerated GPU optimization of the algorithm using GPU global memory, massive block and thread-level parallelism. My GPU optimization achieved 20 times speedup over the naive serial implementation. All results and benchmarking were done across many graph sizes and all results verified using the standard naive implementation to ensure correctness. For more information on methods, results and background, read my paper by clicking this LINK.

Features

  • Serial implementation for baseline benchmarking and results verification
  • Cache-friendly blocked/tiled implementation for improved cache locality and performance on CPU
  • Blocked implementation with OpenMP for parallel CPU acceleration along with improved cache locality (6x speedup)
  • CUDA kernel with massive block and thread-level paralleism for GPU-accelerated performance (20x speedup)
  • Automatic timing across 9 graph sizes (4×4 to 3072×3072)
  • Output includes execution time printed in CSV format, and max error compared to the ground truth for result verification

Build Instructions

CPU Optimization: parallel_fw.c

gcc -O1 -fopenmp parallel_fw.c -lrt -o parallel_fw
OMP_NUM_THREADS=4 ./parallel_fw

GPU Optimization: parallel_fw_cuda.cu

    # USE THIS FOR K40m: (change accordingly for your GPU architecture)
    nvcc -arch=sm_35 parallel_fw_cuda.cu -o parallel_fw_cuda 
    Run: ./parallel_fw_cuda 

    # Use nvidia-smi to find relevant information on your GPU's architecture

About

Contains the C and CUDA files that feature my optimizations of the Floyd-Warshall all-pairs shortest path algorithm for a CPU and GPU. The C and CUDA files contain all necessary helper functions to properly time, initialize, compare and graph the results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published