Inclusive scan on GPU

This is an experiment in benchmarking inclusive scan kernels on GPU. I implemented:

A double buffer kernel
A work efficient kernel. This kernel implements a work-efficient scan algorithm but suffers from thread divergence and memory bank conflicts
A conflict-free kernel. It uses padding to remedy shared memory bank conflicts.
A conflict-free-swizzle kernel. It uses swizzling instead of padding to fix bank conflicts.
A scan-on registers kernel. It uses warp synchronization primitives to compute prefix sums and avoid excessive block-thread synchronizations.
Two decoupled-lookback implementations using previous techniques. The first implementation does a parallel lookback using a block of threads, whereas the latter uses a single warp for the lookback.

Based on https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda I also used these slides: https://people.cs.vt.edu/yongcao/teaching/cs5234/spring2013/slides/Lecture10.pdf

Timings

The experiment is conducted on a A100 GPU.

Scan Method	Number of Passes	Time (us)
Double Buffer	3	501.477
Single Buffer	3	582.766
Work Efficient	3	488.564
Conflict Free Padding	3	448.487
Swizzle	3	436.744
On Registers	3	433.118
Block Lookback	1	380.802
Warp Lookback	1	277.896
Thrust	1	269.444

Inclusive Scan

void inclusive_scan(float const *in, float *out, size_t input_size) {
  out[0] = in[0];
  for (size_t i = 1; i < input_size; i++)
    out[i] = out[i-1] + in[i];
}

`

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
common.cuh		common.cuh
decoupled_lookback.cuh		decoupled_lookback.cuh
drawings.org		drawings.org
scan.cu		scan.cu
three_pass.cuh		three_pass.cuh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inclusive scan on GPU

Timings

Inclusive Scan

About

Uh oh!

Releases

Packages

Languages

ishovkun/CudaScan

Folders and files

Latest commit

History

Repository files navigation

Inclusive scan on GPU

Timings

Inclusive Scan

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages