Talk by Mikkel Thorup

Hashing and Pseudo-Random Numbers with Strong Distributional Guarantees

9th October 2013

At 15.15. Auditorium 10


For the sum of independent Bernoulli random variables, we expect a exponential concentration around the mean, e.g., by Hoeffding's inequality, with n random 0/1 variable, the probability of deviating by more than d*n from the means is bounded by 2*exp(-2*d^2*n).

However, computers do not generate independent variables. Even if they could, we often want a random variable to be computed as a function of a key so that the same key always gets the same random variable. Such a function h is referred to as a hash function.  As an example, we could use h to decide where to store information associated with a key x so that we can later retrieve it.  Another example is that we want to sample from different sets so that given the samples from two sets, we can estimate the size of their intersection. This requires that the sampling decision for an element is the same no matter which set it is sampled from.

Sampling with existing hash functions only gives polynomial concentration bounds. However, we present a hashing scheme, twisted tabulation, that gives exponential concentration bounds akin to Chernoff-Hoeffding bounds.  The scheme is simple and fast, e.g., used as a pseudo-random number generator is 6 times faster than the standard generator provided in C, yet it is the first realistic scheme to provide strong distributional guarantees.