Compiling generalized histograms for GPU

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.

Original languageEnglish
Title of host publicationProceedings of SC 2020 : International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE
Publication date2020
Article number9355244
ISBN (Electronic)9781728199986
DOIs
Publication statusPublished - 2020
Event2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 - Virtual, Atlanta, United States
Duration: 9 Nov 202019 Nov 2020

Conference

Conference2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
LandUnited States
ByVirtual, Atlanta
Periode09/11/202019/11/2020
SponsorACM's Special Interest Group on High Performance Computing (SIGHPC), Association for Computing Machinery, IEEE Computer Society, IEEE's Technical Committee on High Performance Computing (TCHPC)

    Research areas

  • functional programming, GPU, parallelism

ID: 258659299