Compiling generalized histograms for GPU

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Compiling generalized histograms for GPU. / Henriksen, Troels; Hellfritzsch, Sune; Sadayappan, Ponnuswamy; Oancea, Cosmin.

Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. 9355244.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Henriksen, T, Hellfritzsch, S, Sadayappan, P & Oancea, C 2020, Compiling generalized histograms for GPU. in Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis., 9355244, IEEE, 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual, Atlanta, United States, 09/11/2020. https://doi.org/10.1109/SC41405.2020.00101

APA

Henriksen, T., Hellfritzsch, S., Sadayappan, P., & Oancea, C. (2020). Compiling generalized histograms for GPU. In Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis [9355244] IEEE. https://doi.org/10.1109/SC41405.2020.00101

Vancouver

Henriksen T, Hellfritzsch S, Sadayappan P, Oancea C. Compiling generalized histograms for GPU. In Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. 2020. 9355244 https://doi.org/10.1109/SC41405.2020.00101

Author

Henriksen, Troels ; Hellfritzsch, Sune ; Sadayappan, Ponnuswamy ; Oancea, Cosmin. / Compiling generalized histograms for GPU. Proceedings of SC 2020: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020.

Bibtex

@inproceedings{56c739dcb57a49d296445d361903bd10,
title = "Compiling generalized histograms for GPU",
abstract = "We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.",
keywords = "functional programming, GPU, parallelism",
author = "Troels Henriksen and Sune Hellfritzsch and Ponnuswamy Sadayappan and Cosmin Oancea",
year = "2020",
doi = "10.1109/SC41405.2020.00101",
language = "English",
booktitle = "Proceedings of SC 2020",
publisher = "IEEE",
note = "2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 ; Conference date: 09-11-2020 Through 19-11-2020",

}

RIS

TY - GEN

T1 - Compiling generalized histograms for GPU

AU - Henriksen, Troels

AU - Hellfritzsch, Sune

AU - Sadayappan, Ponnuswamy

AU - Oancea, Cosmin

PY - 2020

Y1 - 2020

N2 - We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.

AB - We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardwaresupported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.

KW - functional programming

KW - GPU

KW - parallelism

UR - http://www.scopus.com/inward/record.url?scp=85102348343&partnerID=8YFLogxK

U2 - 10.1109/SC41405.2020.00101

DO - 10.1109/SC41405.2020.00101

M3 - Article in proceedings

AN - SCOPUS:85102348343

BT - Proceedings of SC 2020

PB - IEEE

T2 - 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020

Y2 - 9 November 2020 through 19 November 2020

ER -

ID: 258659299