Compiling generalized histograms for GPU
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Compiling generalized histograms for GPU. / Henriksen, Troels; Hellfritzsch, Sune; Sadayappan, Ponnuswamy; Oancea, Cosmin.
Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. 9355244.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Compiling generalized histograms for GPU
AU - Henriksen, Troels
AU - Hellfritzsch, Sune
AU - Sadayappan, Ponnuswamy
AU - Oancea, Cosmin
PY - 2020
Y1 - 2020
N2 - We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.
AB - We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.
KW - functional programming
KW - GPU
KW - parallelism
UR - http://www.scopus.com/inward/record.url?scp=85102348343&partnerID=8YFLogxK
U2 - 10.1109/SC41405.2020.00101
DO - 10.1109/SC41405.2020.00101
M3 - Article in proceedings
AN - SCOPUS:85102348343
BT - Proceedings of SC 2020
PB - IEEE
T2 - 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
Y2 - 9 November 2020 through 19 November 2020
ER -
ID: 258659299