Locally Uniform Hashing
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Locally Uniform Hashing. / Bercea, Ioana O.; Beretta, Lorenzo; Klausen, Jonas; Houen, Jakob Bak Tejs; Thorup, Mikkel.
Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. p. 1440-1470.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Locally Uniform Hashing
AU - Bercea, Ioana O.
AU - Beretta, Lorenzo
AU - Klausen, Jonas
AU - Houen, Jakob Bak Tejs
AU - Thorup, Mikkel
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance.To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA'97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS'12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08].Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.
AB - Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance.To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA'97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS'12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08].Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.
KW - Concentration Bounds
KW - Hashing
KW - Linear Probing
KW - Tabulation Hashing
KW - Uniform Hashing
U2 - 10.1109/FOCS57990.2023.00089
DO - 10.1109/FOCS57990.2023.00089
M3 - Article in proceedings
AN - SCOPUS:85182392091
SP - 1440
EP - 1470
BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023
PB - IEEE Computer Society Press
T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023
Y2 - 6 November 2023 through 9 November 2023
ER -
ID: 384069326