PhD defence by Lorenzo Beretta
Follow this link to participate on Zoom
Title
Sublinear Algorithms, Online Packing, Hashing and Clustering
Abstract
In this thesis we study the theoretical properties of several algorithmic problems. The first two problems aim at completing a certain task in sublinear time, namely using a number of operation that is sublinear in the input size. A short description of these tasks follows.
- Earth Mover’s Distance: We are given two distribution µ, ν over a generic metric space (M, d). A transport plan between µ and ν is a distribution ξ over M2 that has µ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ[d(x, y)]. Our task is to compute the minimum cost of transport plans between µ and ν.
- Sum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possible.
Then, we study two problems concerned with online packing of polygons. In both cases, we give upper and lower bounds on the competitive ratio of these problems, described below.
- Online packing of rectangles: We are given a sequence of axis-parallel rectangles online and we have to irrevocably place them on the plane so as to minimize the area or perimeter of their axis-parallel bounding box.
- Translational packing of convex polygons: We are given a sequence of convex polygons online and we have to irrevocably place them in a given container on the plane without rotating them. The goal is to use as little container space as possible. We prove a surprising superconstant lower bound on the competitive ratio for several well-studied container types.
Besides packing, we study hash functions.
- Locally uniform hashing: We design Tornado tabulation, a new tabulation-based hash function that aims at filling the gap between (unrealistic) fully-random hashing often used in the analysis of algorithms and practical hash functions with weak theoretical guarantees. In theory, Tornado has strong distributional properties, ensuring that its keys enjoy a certain local full randomness. Moreover, Tornado is practical and it is implementable in a few lines of C code.
Finally, we include a result on clustering.
- Local-search seeding for k-means: The most popular heuristic for k-means clustering, Lloyd’s algorithm, inherits its theoretical guarantees from the seeding strategy employed to initialize cluster centers. We design a local-search algorithm to initialize cluster centers that improves the approximation ratios of previous seeding strategies. We claim that our algorithm is practical and include experiments to support its effectiveness.
Supervisors
Principal Supervisor Mikkel Thorup
Co-Supervisor Mikkel Abrahamsen
Assessment Committee
Professor Amir Yehudayoff, Department of Computer Science, UCPH
Professor Petra Berenbrink, Universität Hamburg
Professor Kasper Green Larsen, Aarhus Universitet
For an electronic copy of the thesis, please visit the PhD Programme page.