Semi-automatic tool to ease the creation and optimization of GPU programs

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Semi-automatic tool to ease the creation and optimization of GPU programs. / Jepsen, Jacob.

Proceedings of the 43rd International Conference on Parallel Processing Workshops: ICPPW 2014. IEEE, 2014. p. 196-205.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Jepsen, J 2014, Semi-automatic tool to ease the creation and optimization of GPU programs. in Proceedings of the 43rd International Conference on Parallel Processing Workshops: ICPPW 2014. IEEE, pp. 196-205, 43rd International Conference on Parallel Processing Workshops, ICPPW 2014, Minneapolis, United States, 09/09/2014. https://doi.org/10.1109/ICPPW.2014.36

APA

Jepsen, J. (2014). Semi-automatic tool to ease the creation and optimization of GPU programs. In Proceedings of the 43rd International Conference on Parallel Processing Workshops: ICPPW 2014 (pp. 196-205). IEEE. https://doi.org/10.1109/ICPPW.2014.36

Vancouver

Jepsen J. Semi-automatic tool to ease the creation and optimization of GPU programs. In Proceedings of the 43rd International Conference on Parallel Processing Workshops: ICPPW 2014. IEEE. 2014. p. 196-205 https://doi.org/10.1109/ICPPW.2014.36

Author

Jepsen, Jacob. / Semi-automatic tool to ease the creation and optimization of GPU programs. Proceedings of the 43rd International Conference on Parallel Processing Workshops: ICPPW 2014. IEEE, 2014. pp. 196-205

Bibtex

@inproceedings{fb142d0db43f4ffab201955a33a54902,
title = "Semi-automatic tool to ease the creation and optimization of GPU programs",
abstract = "We present a tool that reduces the development time of GPU-executable code. We implement a catalogue of common optimizations specific to the GPU architecture. Through the tool, the programmer can semi-automatically transform a computationally-intensive code section into GPU-executable form and apply optimizations thereto. Based on experiments, the code generated by the tool can be 3-256X faster than code generated by an OpenACC compiler, 4-37X faster than optimized CPU code, and attain up to 25% of peak performance of the GPU. We found that by using pattern-matching rules, many of the transformations can be performed automatically, which makes the tool usable for both novices and experts in GPU programming.",
author = "Jacob Jepsen",
year = "2014",
doi = "10.1109/ICPPW.2014.36",
language = "English",
isbn = "978-1-4799-5615-9",
pages = "196--205",
booktitle = "Proceedings of the 43rd International Conference on Parallel Processing Workshops",
publisher = "IEEE",
note = "43rd International Conference on Parallel Processing Workshops, ICPPW 2014 ; Conference date: 09-09-2014 Through 12-09-2014",

}

RIS

TY - GEN

T1 - Semi-automatic tool to ease the creation and optimization of GPU programs

AU - Jepsen, Jacob

PY - 2014

Y1 - 2014

N2 - We present a tool that reduces the development time of GPU-executable code. We implement a catalogue of common optimizations specific to the GPU architecture. Through the tool, the programmer can semi-automatically transform a computationally-intensive code section into GPU-executable form and apply optimizations thereto. Based on experiments, the code generated by the tool can be 3-256X faster than code generated by an OpenACC compiler, 4-37X faster than optimized CPU code, and attain up to 25% of peak performance of the GPU. We found that by using pattern-matching rules, many of the transformations can be performed automatically, which makes the tool usable for both novices and experts in GPU programming.

AB - We present a tool that reduces the development time of GPU-executable code. We implement a catalogue of common optimizations specific to the GPU architecture. Through the tool, the programmer can semi-automatically transform a computationally-intensive code section into GPU-executable form and apply optimizations thereto. Based on experiments, the code generated by the tool can be 3-256X faster than code generated by an OpenACC compiler, 4-37X faster than optimized CPU code, and attain up to 25% of peak performance of the GPU. We found that by using pattern-matching rules, many of the transformations can be performed automatically, which makes the tool usable for both novices and experts in GPU programming.

U2 - 10.1109/ICPPW.2014.36

DO - 10.1109/ICPPW.2014.36

M3 - Article in proceedings

SN - 978-1-4799-5615-9

SP - 196

EP - 205

BT - Proceedings of the 43rd International Conference on Parallel Processing Workshops

PB - IEEE

T2 - 43rd International Conference on Parallel Processing Workshops, ICPPW 2014

Y2 - 9 September 2014 through 12 September 2014

ER -

ID: 162745676