Locally Uniform Hashing

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Locally Uniform Hashing. / Bercea, Ioana O.; Beretta, Lorenzo; Klausen, Jonas; Houen, Jakob Bak Tejs; Thorup, Mikkel.

Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. p. 1440-1470.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Bercea, IO, Beretta, L, Klausen, J, Houen, JBT & Thorup, M 2023, Locally Uniform Hashing. in Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, pp. 1440-1470, 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, United States, 06/11/2023. https://doi.org/10.1109/FOCS57990.2023.00089

APA

Bercea, I. O., Beretta, L., Klausen, J., Houen, J. B. T., & Thorup, M. (2023). Locally Uniform Hashing. In Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023 (pp. 1440-1470). IEEE Computer Society Press. https://doi.org/10.1109/FOCS57990.2023.00089

Vancouver

Bercea IO, Beretta L, Klausen J, Houen JBT, Thorup M. Locally Uniform Hashing. In Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press. 2023. p. 1440-1470 https://doi.org/10.1109/FOCS57990.2023.00089

Author

Bercea, Ioana O. ; Beretta, Lorenzo ; Klausen, Jonas ; Houen, Jakob Bak Tejs ; Thorup, Mikkel. / Locally Uniform Hashing. Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. pp. 1440-1470

Bibtex

@inproceedings{a3c410b499e240348725abbf51fa9ea7,

title = "Locally Uniform Hashing",

abstract = "Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance.To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA'97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS'12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08].Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques. ",

keywords = "Concentration Bounds, Hashing, Linear Probing, Tabulation Hashing, Uniform Hashing",

author = "Bercea, {Ioana O.} and Lorenzo Beretta and Jonas Klausen and Houen, {Jakob Bak Tejs} and Mikkel Thorup",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023 ; Conference date: 06-11-2023 Through 09-11-2023",

year = "2023",

doi = "10.1109/FOCS57990.2023.00089",

language = "English",

pages = "1440--1470",

booktitle = "Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023",

publisher = "IEEE Computer Society Press",

address = "United States",

}

RIS

TY - GEN

T1 - Locally Uniform Hashing

AU - Bercea, Ioana O.

AU - Beretta, Lorenzo

AU - Klausen, Jonas

AU - Houen, Jakob Bak Tejs

AU - Thorup, Mikkel

PY - 2023

Y1 - 2023

N2 - Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance.To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA'97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS'12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08].Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.

AB - Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance.To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA'97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS'12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08].Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.

KW - Concentration Bounds

KW - Hashing

KW - Linear Probing

KW - Tabulation Hashing

KW - Uniform Hashing

U2 - 10.1109/FOCS57990.2023.00089

DO - 10.1109/FOCS57990.2023.00089

M3 - Article in proceedings

AN - SCOPUS:85182392091

SP - 1440

EP - 1470

BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023

PB - IEEE Computer Society Press

T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023

Y2 - 6 November 2023 through 9 November 2023

ER -

ID: 384069326

Department of Computer Science