Strategies for regular segmented reductions on GPU

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Strategies for regular segmented reductions on GPU. / Larsen, Rasmus Wriedt; Henriksen, Troels.

Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, 2017. s. 42-52.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Larsen, RW & Henriksen, T 2017, Strategies for regular segmented reductions on GPU. i Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, s. 42-52, 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing, Oxford, Storbritannien, 07/09/2017. https://doi.org/10.1145/3122948.3122952

APA

Larsen, R. W., & Henriksen, T. (2017). Strategies for regular segmented reductions on GPU. I Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (s. 42-52). Association for Computing Machinery. https://doi.org/10.1145/3122948.3122952

Vancouver

Larsen RW, Henriksen T. Strategies for regular segmented reductions on GPU. I Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery. 2017. s. 42-52 https://doi.org/10.1145/3122948.3122952

Author

Larsen, Rasmus Wriedt ; Henriksen, Troels. / Strategies for regular segmented reductions on GPU. Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, 2017. s. 42-52

Bibtex

@inproceedings{9b2c9d9bff73443dbcb7f9fc7a4a253c,

title = "Strategies for regular segmented reductions on GPU",

abstract = "We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.",

keywords = "Functional programming, GPGPU, Parallelism",

author = "Larsen, {Rasmus Wriedt} and Troels Henriksen",

year = "2017",

doi = "10.1145/3122948.3122952",

language = "English",

pages = "42--52",

booktitle = "Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing",

publisher = "Association for Computing Machinery",

note = "null ; Conference date: 07-09-2017 Through 07-09-2017",

}

RIS

TY - GEN

T1 - Strategies for regular segmented reductions on GPU

AU - Larsen, Rasmus Wriedt

AU - Henriksen, Troels

N1 - Conference code: 6

PY - 2017

Y1 - 2017

N2 - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

AB - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

KW - Functional programming

KW - GPGPU

KW - Parallelism

UR - http://www.scopus.com/inward/record.url?scp=85030990504&partnerID=8YFLogxK

U2 - 10.1145/3122948.3122952

DO - 10.1145/3122948.3122952

M3 - Article in proceedings

AN - SCOPUS:85030990504

SP - 42

EP - 52

BT - Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing

PB - Association for Computing Machinery

Y2 - 7 September 2017 through 7 September 2017

ER -

ID: 188403745

Datalogisk Institut