PhD defence by Philip Munksgaard
Static and Dynamic Analyses for Efficient GPU Execution
This thesis contains a description and discussion of various techniques for automatically optimizing memory usage in GPU programs. The headliner is the "array short-circuiting" optimization, which aims to tackle some of inherent overhead caused by the functional programming style of array programming. In particular, array short-circuiting can automatically rewrite array updates and concatenations to happen without the usage of temporary arrays. The compiler is able to achieve this by using an intermediate representation for the compiled program called FunMem. FunMem associates arrays with index functions in a format called LMADs and uses those index functions to support zero-overhead change-of-layout operations on arrays, such as transpositions, slices and more. Furthermore, to support array short-circuiting, we present a novel technique for proving whether the elements represented by two LMADs are disjoint. In our experimental evaluation, we show that array short-circuiting is able to optimize complex benchmark programs written in the Futhark programming language, improving performance by as much as 2x. The resulting programs are competitive with hand-written and -optimized public benchmark implementations written in the more low-level OpenCL and CUDA languages.
Principal Supervisor Cosmin Oancea
Co-Supervisor Troels Henriksen
Associate Professor Torben Ægidius Mogensen, DIKU
Professor Phil Trinder, School of Computing Science, Glasgow University
Full Professor Lawrence Rauchwerger, Department of Computer Science, University of Ilinois at Urbana Champaign
For an electronic copy of the thesis, please visit the PhD Programme page.