Optimization of Generic Multiset Programming for bulk data processing – Københavns Universitet

Optimization of Generic Multiset Programming for bulk data processing

Master thesis defence by Thomas Kjeldsen

Abstract:

We examine the performance of the Generic Multiset Programming (GMP) library for SQL-style programming, which allows for naive programming with cartesian products while avoiding the quadratic time to compute them. The performance of GMP is evaluated and compared against MySQL, and benchmarks from (Henglein and Larsen, 2010, 2011) are replicated, showing similarities, but also some notably different results. Through code review and profiling we demonstrate the potential for performance improvements, determine a practical lower bound on efficiency, and propose and discuss a number of concrete optimizations of the GMP library.

Advisor and internal evaluator: Fritz Henglein, DIKU

External evaluator: Mads Rosendahl, RUC

The defense is public. All are welcome. No registration required.