CUDPP is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel prefix-sum (“scan”), parallel sort and parallel reduction.
Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. CUDPP is designed to run on processors that are able to support CUDA.
CUDPP was initially developed to test the algorithms developed in C for CUDA for the articles “Parallel Prefix Sum (Scan) in CUDA”, by Mark Harris, Shubho Sengupta, and John Owens (published in GPU Gems 3), and “Scan Primitives for GPU Computing”, by Shubho Sengupta, Mark Harris, Yao Zhang, and John Owens (published in the proceedings of Graphics Hardware 2007).
What`s New in This Release: [ read full changelog ]
· Fix scan, segmented scan, and radix sort correctness on Fermi (sm_20) architecture GPUs(proper use of "volatile" keyword)
· Some initial small optimizations for radix sort and scan on Fermi (sm_20) architecture
· Fix emulation mode radix sort of very small arrays
· Fix radix sort on 64-bit OSes by using __launch_bounds__ in CUDA 3.0
· Minor efficiency improvement to radix sort test in cudpp_testrig
· Fixed incorrect identity for min operator
· Fixes for unix and Mac OS X Snow Leopard builds
· Fixes for 64-bit windows builds
· Bibliography updates
· Minor documentation fixes