On the Path to Performance -- Using Greedy and Exact Seriation Algorithms to Increase the Kernel Instruction Count for a CUDA back-end for Numerical Python

Master Thesis Defence by Clara Bettina Behrmann

Abstract:

Seriation is a combinatorial data problem, which aims at revealing hidden structures in data, such as subgroups and clusters. When given a set of objects, and a similarity between them, the seriation problem seeks to order the objects along a line, such that objects that are similar to each other appear close in the ordering.

Seriation has been used and reinvented widely, and a plethora of different seriation algorithms exist. In this thesis we describe Seriation and compare a select number of seriation algorithms, before finally examining how seriation can be used to increase the performance of the Copenhagen Vector Oriented Byte Code back-end for Numerical Python.

Copenhagen Vector Oriented Byte Code is a back-end for Numerical Python, which aims to enable the programmer to utilize the power inherent in heterogeneous systems. One area where cphVB can be improved is that of kernel construction. At the moment, kernels are constructed in a naive fashion that does not maximize the kernel instruction count. Using Seriation to bundle together operation in a way, that maximizes the kernel instruction count is therefore expected to increase the gain in performance for cphVB.

Preliminary results indicate that seriation can be used to reduce the number of kernels, and may even be able to decrease the number of data transfers needed, by ordering operations that work on the same operands closely together.

The defence will be in English.

External Examiner:

Inge Li Gørtz, IMM/DTU

Supervisor:

Ken Friis Larsen, DIKU