Incremental flattening for nested data parallelism

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Incremental flattening for nested data parallelism. / Henriksen, Troels; Thorøe, Frederik; Elsman, Martin; Oancea, Cosmin.

PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery, 2019. p. 53-67.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Henriksen, T, Thorøe, F, Elsman, M & Oancea, C 2019, Incremental flattening for nested data parallelism. in PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery, pp. 53-67, 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, United States, 16/02/2019. https://doi.org/10.1145/3293883.3295707

APA

Henriksen, T., Thorøe, F., Elsman, M., & Oancea, C. (2019). Incremental flattening for nested data parallelism. In PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming (pp. 53-67). Association for Computing Machinery. https://doi.org/10.1145/3293883.3295707

Vancouver

Henriksen T, Thorøe F, Elsman M, Oancea C. Incremental flattening for nested data parallelism. In PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery. 2019. p. 53-67 https://doi.org/10.1145/3293883.3295707

Author

Henriksen, Troels ; Thorøe, Frederik ; Elsman, Martin ; Oancea, Cosmin. / Incremental flattening for nested data parallelism. PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery, 2019. pp. 53-67

Bibtex

@inproceedings{b10d25d422534ddda7f653745f3eb913,
title = "Incremental flattening for nested data parallelism",
abstract = "Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. This paper proposes such a technique, which builds on fattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application's regular nested parallelism to the levels of parallelism supported by the hardware. These code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics. Our unsupervised method-of statically clustering datasets to code versions-is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets. We demonstrate-by fully integrating our technique in the repertoire of a compiler for the Futhark programming language-significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.",
keywords = "Compilers, Functional language, GPGPU, Parallel",
author = "Troels Henriksen and Frederik Thor{\o}e and Martin Elsman and Cosmin Oancea",
year = "2019",
month = feb,
day = "16",
doi = "10.1145/3293883.3295707",
language = "English",
pages = "53--67",
booktitle = "PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming",
publisher = "Association for Computing Machinery",
note = "24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019 ; Conference date: 16-02-2019 Through 20-02-2019",

}

RIS

TY - GEN

T1 - Incremental flattening for nested data parallelism

AU - Henriksen, Troels

AU - Thorøe, Frederik

AU - Elsman, Martin

AU - Oancea, Cosmin

PY - 2019/2/16

Y1 - 2019/2/16

N2 - Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. This paper proposes such a technique, which builds on fattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application's regular nested parallelism to the levels of parallelism supported by the hardware. These code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics. Our unsupervised method-of statically clustering datasets to code versions-is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets. We demonstrate-by fully integrating our technique in the repertoire of a compiler for the Futhark programming language-significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.

AB - Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. This paper proposes such a technique, which builds on fattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application's regular nested parallelism to the levels of parallelism supported by the hardware. These code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics. Our unsupervised method-of statically clustering datasets to code versions-is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets. We demonstrate-by fully integrating our technique in the repertoire of a compiler for the Futhark programming language-significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.

KW - Compilers

KW - Functional language

KW - GPGPU

KW - Parallel

UR - http://www.scopus.com/inward/record.url?scp=85064234149&partnerID=8YFLogxK

U2 - 10.1145/3293883.3295707

DO - 10.1145/3293883.3295707

M3 - Article in proceedings

AN - SCOPUS:85064234149

SP - 53

EP - 67

BT - PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming

PB - Association for Computing Machinery

T2 - 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019

Y2 - 16 February 2019 through 20 February 2019

ER -

ID: 230447731