Investigating Just-In-Time Compilation of Vector Bytecode – University of Copenhagen

Investigating Just-In-Time Compilation of Vector Bytecode

Master's defence by Johannes Lund, DIKU

It takes place: 19/12/2012 at 10:30 in Auditorium B, NBI, Blegdamsvej 17

Abstract:

Modern science research fields, such as the physics and chemistry, require access to vast amount of computational resources. The resources typical range from General Purpose GPU's and single multi-core CPUs to large clusters. The architecture changes rapidly as well, resulting a wide range of available architectures. To utilize these, an in-depth low-level knowledge of High Performance Computing (HPC) is needed. Thus, scientists that are already very skilled in their own field and need computational resource, must also understand HPC intimately to be successful. We like scientists to express their problems in high-level frameworks, such as Numpy or MATLAB and then allow the framework to optimize their problem into efficient low-level code. However, most frameworks only support limited architectures, and we wish to let the scientists use any architectures as their disposal. One framework that thrives to do this, is cphVB. It executes efficiently on a wide range of architectures, with the goal of bridging the gap between high productivity languages and high performance execution. This thesis presents an extension to cphVB that enable it to increase performance using JIT techniques.

We have implemented a JIT Framework for cphVB. At runtime, we transform cphVB bytecode into an Abstract Syntax Tree representation that creates the basis for further optimizations. From the AST's we auto-generate C code which we compiled in to computational kernels and executed. These incorporate temporary array removal and loop-fusion which are main benefactors in the achieved speedups. To hide the overhead of creation, we also implement a cache for the compiled kernels. We show that this approach is a very strong combination for the problems faced by cphVB. Compared to Numpy, we achieve speedups of a factor 10.95 for a 2D Shallow Water and 7.51 for a Jacobi Stencil running on a single CPU core.

Supervisor: Brian Vinter, KU

Censor: Josva Kleist, AAU