Understanding the Moments of Tabulation Hashing via Chaoses

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Understanding the Moments of Tabulation Hashing via Chaoses. / Houen, Jakob Bæk Tejs; Thorup, Mikkel.

49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022. red. / Mikolaj Bojanczyk; Emanuela Merelli; David P. Woodruff. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2022. s. 1-19 74 (Leibniz International Proceedings in Informatics, LIPIcs, Bind 229).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Houen, JBT & Thorup, M 2022, Understanding the Moments of Tabulation Hashing via Chaoses. i M Bojanczyk, E Merelli & DP Woodruff (red), 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022., 74, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Leibniz International Proceedings in Informatics, LIPIcs, bind 229, s. 1-19, 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022, Paris, Frankrig, 04/07/2022. https://doi.org/10.4230/LIPIcs.ICALP.2022.74

APA

Houen, J. B. T., & Thorup, M. (2022). Understanding the Moments of Tabulation Hashing via Chaoses. I M. Bojanczyk, E. Merelli, & D. P. Woodruff (red.), 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022 (s. 1-19). [74] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. Leibniz International Proceedings in Informatics, LIPIcs Bind 229 https://doi.org/10.4230/LIPIcs.ICALP.2022.74

Vancouver

Houen JBT, Thorup M. Understanding the Moments of Tabulation Hashing via Chaoses. I Bojanczyk M, Merelli E, Woodruff DP, red., 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2022. s. 1-19. 74. (Leibniz International Proceedings in Informatics, LIPIcs, Bind 229). https://doi.org/10.4230/LIPIcs.ICALP.2022.74

Author

Houen, Jakob Bæk Tejs ; Thorup, Mikkel. / Understanding the Moments of Tabulation Hashing via Chaoses. 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022. red. / Mikolaj Bojanczyk ; Emanuela Merelli ; David P. Woodruff. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2022. s. 1-19 (Leibniz International Proceedings in Informatics, LIPIcs, Bind 229).

Bibtex

@inproceedings{983e23ef6e2146c0a2c3ddce137fc3a5,
title = "Understanding the Moments of Tabulation Hashing via Chaoses",
abstract = "Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as c characters from some alphabet Σ, we have c fully random hash functions h0,..., hc−1: Σ → {0,..., 2l − 1}, and a key x = (x0,..., xc−1) is hashed to (Equation presented) where ⊕ is the bitwise XOR operation. The previous results on tabulation hashing by Pǎtra{\c s}cu and Thorup [J.ACM'11] and by Aamand et al. [STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Thus their results cannot bound moments. The paper Dahlgaard et al. [FOCS'15] provides a bound on the moments of certain hash-based sums, but their bound only holds for constant moments, and we need logarithmic moments. Chaoses are random variables of the form (Equation presented) where Xi are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on c. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al. [FOCS'15]. With simple tabulation hashing, there are certain inputs for which the concentration is much worse than with fully random hashing. However, with mixed tabulation, we get logarithmic moment bounds that are only a constant factor worse than those with fully random hashing for any possible input. This is a strong addition to other powerful probabilistic properties of mixed tabulation hashing proved by Dahlgaard et al.",
keywords = "concentration bounds, hashing, moment bounds",
author = "Houen, {Jakob B{\ae}k Tejs} and Mikkel Thorup",
note = "Publisher Copyright: {\textcopyright} Jakob B{\ae}k Tejs Houen and Mikkel Thorup; licensed under Creative Commons License CC-BY 4.0; 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022 ; Conference date: 04-07-2022 Through 08-07-2022",
year = "2022",
month = jul,
doi = "10.4230/LIPIcs.ICALP.2022.74",
language = "English",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
pages = "1--19",
editor = "Mikolaj Bojanczyk and Emanuela Merelli and Woodruff, {David P.}",
booktitle = "49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022",

}

RIS

TY - GEN

T1 - Understanding the Moments of Tabulation Hashing via Chaoses

AU - Houen, Jakob Bæk Tejs

AU - Thorup, Mikkel

N1 - Publisher Copyright: © Jakob Bæk Tejs Houen and Mikkel Thorup; licensed under Creative Commons License CC-BY 4.0

PY - 2022/7

Y1 - 2022/7

N2 - Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as c characters from some alphabet Σ, we have c fully random hash functions h0,..., hc−1: Σ → {0,..., 2l − 1}, and a key x = (x0,..., xc−1) is hashed to (Equation presented) where ⊕ is the bitwise XOR operation. The previous results on tabulation hashing by Pǎtraşcu and Thorup [J.ACM'11] and by Aamand et al. [STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Thus their results cannot bound moments. The paper Dahlgaard et al. [FOCS'15] provides a bound on the moments of certain hash-based sums, but their bound only holds for constant moments, and we need logarithmic moments. Chaoses are random variables of the form (Equation presented) where Xi are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on c. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al. [FOCS'15]. With simple tabulation hashing, there are certain inputs for which the concentration is much worse than with fully random hashing. However, with mixed tabulation, we get logarithmic moment bounds that are only a constant factor worse than those with fully random hashing for any possible input. This is a strong addition to other powerful probabilistic properties of mixed tabulation hashing proved by Dahlgaard et al.

AB - Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as c characters from some alphabet Σ, we have c fully random hash functions h0,..., hc−1: Σ → {0,..., 2l − 1}, and a key x = (x0,..., xc−1) is hashed to (Equation presented) where ⊕ is the bitwise XOR operation. The previous results on tabulation hashing by Pǎtraşcu and Thorup [J.ACM'11] and by Aamand et al. [STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Thus their results cannot bound moments. The paper Dahlgaard et al. [FOCS'15] provides a bound on the moments of certain hash-based sums, but their bound only holds for constant moments, and we need logarithmic moments. Chaoses are random variables of the form (Equation presented) where Xi are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on c. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al. [FOCS'15]. With simple tabulation hashing, there are certain inputs for which the concentration is much worse than with fully random hashing. However, with mixed tabulation, we get logarithmic moment bounds that are only a constant factor worse than those with fully random hashing for any possible input. This is a strong addition to other powerful probabilistic properties of mixed tabulation hashing proved by Dahlgaard et al.

KW - concentration bounds

KW - hashing

KW - moment bounds

UR - http://www.scopus.com/inward/record.url?scp=85133415753&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ICALP.2022.74

DO - 10.4230/LIPIcs.ICALP.2022.74

M3 - Article in proceedings

AN - SCOPUS:85133415753

T3 - Leibniz International Proceedings in Informatics, LIPIcs

SP - 1

EP - 19

BT - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022

A2 - Bojanczyk, Mikolaj

A2 - Merelli, Emanuela

A2 - Woodruff, David P.

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022

Y2 - 4 July 2022 through 8 July 2022

ER -

ID: 314296202