Fast and Powerful Hashing using Tabulation - by Mikkel Thorup
Keynote talk by Mikkel Thorup at the FOGA (Foundations of Genetic Algorithms) workshop. This keynote talk is open to the public - participation in other parts of the workshop requires formal registration and payment.
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist . Keys are viewed as consisting of c characters and we have precomputed character tables h1,...,hq mapping characters to random hash values. A key x = (x1, ..., xc) is hashed to h1[x1] ⊕ h2[x2]..... ⊕ hc[xc]. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one character is "twisted" with some simple operations. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing.
Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman . In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing.
While these tabulation schemes are all easy to implement and use, their analysis is not.
The 14th ACM/SIGEVO Workshop on Foundations of Genetic Algorithms (FOGA XIV) will take place on January 12-15 in Copenhagen, Denmark.
FOGA is the premier event on the theoretical foundations of evolutionary computation and invites submissions on all kinds of randomised search heuristics, including but not limited to evolutionary algorithms, ant colony optimisation, artificial immune systems, particle swarm optimisation, simulated annealing, Bayesian optimisation, and other Monte Carlo methods for search and optimisation.
The goal of FOGA is to advance the theoretical understanding of randomised search heuristics and to contribute to making randomised search heuristics more useful in practice. We particularly encourage submissions bridging theory and practice. In addition to rigorous mathematical investigations, experimental studies contributing towards the theoretical foundations of randomised search heuristics are also welcome.