Fast hashing with strong concentration bounds
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Documents
- Fast Hashing with Strong Concentration Bounds
Final published version, 894 KB, PDF document
Previous work on tabulation hashing by PÇtraşcu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of c=O(1) characters, e.g., a 64-bit key as c=8 characters of 8-bits. The character domain ς should be small enough that character tables of size |ς| fit in fast cache. The schemes then use O(1) tables of this size, so the space of tabulation hashing is O(|ς|). However, the concentration bounds by PÇtraşcu and Thorup only apply if the expected sums are g‰ |ς|. To see the problem, consider the very simple case where we use tabulation hashing to throw n balls into m bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if n=m, for then the expected value is 1. However, if m=2, as when tossing n unbiased coins, the expected value n/2 is ≫ |ς| for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call tabulation-permutation hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.
Original language | English |
---|---|
Title of host publication | STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing |
Editors | Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, Julia Chuzhoy |
Publisher | Association for Computing Machinery |
Publication date | 2020 |
Pages | 1265-1278 |
ISBN (Electronic) | 9781450369794 |
DOIs | |
Publication status | Published - 2020 |
Event | 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020 - Chicago, United States Duration: 22 Jun 2020 → 26 Jun 2020 |
Conference
Conference | 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020 |
---|---|
Land | United States |
By | Chicago |
Periode | 22/06/2020 → 26/06/2020 |
Sponsor | ACM Special Interest Group on Algorithms and Computation Theory (SIGACT) |
Series | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|
ISSN | 0737-8017 |
- Chernoff bounds, Concentration bounds, Hashing, Sampling, Streaming algorithms
Research areas
Number of downloads are based on statistics from Google Scholar and www.ku.dk
ID: 258498058