Strategies for regular segmented reductions on GPU
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Strategies for regular segmented reductions on GPU. / Larsen, Rasmus Wriedt; Henriksen, Troels.
Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, 2017. s. 42-52.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Strategies for regular segmented reductions on GPU
AU - Larsen, Rasmus Wriedt
AU - Henriksen, Troels
N1 - Conference code: 6
PY - 2017
Y1 - 2017
N2 - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.
AB - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.
KW - Functional programming
KW - GPGPU
KW - Parallelism
UR - http://www.scopus.com/inward/record.url?scp=85030990504&partnerID=8YFLogxK
U2 - 10.1145/3122948.3122952
DO - 10.1145/3122948.3122952
M3 - Article in proceedings
AN - SCOPUS:85030990504
SP - 42
EP - 52
BT - Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing
PB - Association for Computing Machinery
Y2 - 7 September 2017 through 7 September 2017
ER -
ID: 188403745