Fast and powerful hashing using tabulation

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Fast and powerful hashing using tabulation. / Thorup, Mikkel.

In: Communications of the ACM, Vol. 60, No. 7, 07.2017, p. 94-101.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Thorup, M 2017, 'Fast and powerful hashing using tabulation', Communications of the ACM, vol. 60, no. 7, pp. 94-101. https://doi.org/10.1145/3068772

APA

Thorup, M. (2017). Fast and powerful hashing using tabulation. Communications of the ACM, 60(7), 94-101. https://doi.org/10.1145/3068772

Vancouver

Thorup M. Fast and powerful hashing using tabulation. Communications of the ACM. 2017 Jul;60(7):94-101. https://doi.org/10.1145/3068772

Author

Thorup, Mikkel. / Fast and powerful hashing using tabulation. In: Communications of the ACM. 2017 ; Vol. 60, No. 7. pp. 94-101.

Bibtex

@article{d87491d96ef9480fbcbb2f58483aba6b,
title = "Fast and powerful hashing using tabulation",
abstract = "Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, ⋯, hc mapping characters to random hash values. A key x = (x1, ⋯, xc) is hashed to h1[x1] ⊕ h2[x2]⋯ ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is {"}twisted{"} in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.",
author = "Mikkel Thorup",
year = "2017",
month = jul,
doi = "10.1145/3068772",
language = "English",
volume = "60",
pages = "94--101",
journal = "Communications of the A C M",
issn = "0001-0782",
publisher = "Association for Computing Machinery, Inc.",
number = "7",

}

RIS

TY - JOUR

T1 - Fast and powerful hashing using tabulation

AU - Thorup, Mikkel

PY - 2017/7

Y1 - 2017/7

N2 - Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, ⋯, hc mapping characters to random hash values. A key x = (x1, ⋯, xc) is hashed to h1[x1] ⊕ h2[x2]⋯ ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.

AB - Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, ⋯, hc mapping characters to random hash values. A key x = (x1, ⋯, xc) is hashed to h1[x1] ⊕ h2[x2]⋯ ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.

UR - http://www.scopus.com/inward/record.url?scp=85021711221&partnerID=8YFLogxK

U2 - 10.1145/3068772

DO - 10.1145/3068772

M3 - Journal article

AN - SCOPUS:85021711221

VL - 60

SP - 94

EP - 101

JO - Communications of the A C M

JF - Communications of the A C M

SN - 0001-0782

IS - 7

ER -

ID: 181356563