Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. / Kacham, Praneeth; Pagh, Rasmus; Thorup, Mikkel; Woodruff, David P.

Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. p. 1515-1550 (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Kacham, P, Pagh, R, Thorup, M & Woodruff, DP 2023, Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. in Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 1515-1550, 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, United States, 06/11/2023. https://doi.org/10.1109/FOCS57990.2023.00093

APA

Kacham, P., Pagh, R., Thorup, M., & Woodruff, D. P. (2023). Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. In Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023 (pp. 1515-1550). IEEE Computer Society Press. Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS https://doi.org/10.1109/FOCS57990.2023.00093

Vancouver

Kacham P, Pagh R, Thorup M, Woodruff DP. Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. In Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press. 2023. p. 1515-1550. (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS). https://doi.org/10.1109/FOCS57990.2023.00093

Author

Kacham, Praneeth ; Pagh, Rasmus ; Thorup, Mikkel ; Woodruff, David P. / Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. pp. 1515-1550 (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS).

Bibtex

@inproceedings{a77199206777408785d2867c4ff538d2,
title = "Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming",
abstract = "We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example:•Andoni's F_p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of ℓ_p sampling in a turnstile stream. Both of our algorithms use {\~O}(d1-2/p) bits of space and have O(1) update time.•For 0< p<2, the 1 pm ϵ approximate F_p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(ϵ-2 log d) bits of space but has an update time of O(log 2(1/ϵ) log log (1/ϵ)). Using HashPRG, we show that if 1/√d ≤ ϵ ≤ 1/dc for an arbitrarily small constant c > 0, then we can obtain a 1 pm ϵ approximate F_p estimation algorithm that uses the optimal O(ϵ-2 log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.•CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch.For a d-dimensional vector x being updated in a turnstile stream, we show that ||x||∞ can be estimated up to an additive error of ϵ||x||_2 using O(ϵ-2 log (1/ϵ) log d) bits of space. Additionally, the update time of this algorithm is O(log 1/ϵ) in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with ||x||∞=Θ(||x||_2), we show that the lower bound can be broken by giving an algorithm that uses O(ϵ-2 log d) bits of space which approximates ||x||∞ up to an additive error of ϵ||x||_2. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O(log 1/ϵ) in the Word RAM model. ",
keywords = "Fast Update time, Pseudorandom Generators, Streaming Algorithms",
author = "Praneeth Kacham and Rasmus Pagh and Mikkel Thorup and Woodruff, {David P.}",
note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023 ; Conference date: 06-11-2023 Through 09-11-2023",
year = "2023",
doi = "10.1109/FOCS57990.2023.00093",
language = "English",
series = "Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS",
pages = "1515--1550",
booktitle = "Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023",
publisher = "IEEE Computer Society Press",
address = "United States",

}

RIS

TY - GEN

T1 - Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

AU - Kacham, Praneeth

AU - Pagh, Rasmus

AU - Thorup, Mikkel

AU - Woodruff, David P.

N1 - Publisher Copyright: © 2023 IEEE.

PY - 2023

Y1 - 2023

N2 - We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example:•Andoni's F_p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of ℓ_p sampling in a turnstile stream. Both of our algorithms use Õ(d1-2/p) bits of space and have O(1) update time.•For 0< p<2, the 1 pm ϵ approximate F_p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(ϵ-2 log d) bits of space but has an update time of O(log 2(1/ϵ) log log (1/ϵ)). Using HashPRG, we show that if 1/√d ≤ ϵ ≤ 1/dc for an arbitrarily small constant c > 0, then we can obtain a 1 pm ϵ approximate F_p estimation algorithm that uses the optimal O(ϵ-2 log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.•CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch.For a d-dimensional vector x being updated in a turnstile stream, we show that ||x||∞ can be estimated up to an additive error of ϵ||x||_2 using O(ϵ-2 log (1/ϵ) log d) bits of space. Additionally, the update time of this algorithm is O(log 1/ϵ) in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with ||x||∞=Θ(||x||_2), we show that the lower bound can be broken by giving an algorithm that uses O(ϵ-2 log d) bits of space which approximates ||x||∞ up to an additive error of ϵ||x||_2. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O(log 1/ϵ) in the Word RAM model.

AB - We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example:•Andoni's F_p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of ℓ_p sampling in a turnstile stream. Both of our algorithms use Õ(d1-2/p) bits of space and have O(1) update time.•For 0< p<2, the 1 pm ϵ approximate F_p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(ϵ-2 log d) bits of space but has an update time of O(log 2(1/ϵ) log log (1/ϵ)). Using HashPRG, we show that if 1/√d ≤ ϵ ≤ 1/dc for an arbitrarily small constant c > 0, then we can obtain a 1 pm ϵ approximate F_p estimation algorithm that uses the optimal O(ϵ-2 log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.•CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch.For a d-dimensional vector x being updated in a turnstile stream, we show that ||x||∞ can be estimated up to an additive error of ϵ||x||_2 using O(ϵ-2 log (1/ϵ) log d) bits of space. Additionally, the update time of this algorithm is O(log 1/ϵ) in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with ||x||∞=Θ(||x||_2), we show that the lower bound can be broken by giving an algorithm that uses O(ϵ-2 log d) bits of space which approximates ||x||∞ up to an additive error of ϵ||x||_2. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O(log 1/ϵ) in the Word RAM model.

KW - Fast Update time

KW - Pseudorandom Generators

KW - Streaming Algorithms

U2 - 10.1109/FOCS57990.2023.00093

DO - 10.1109/FOCS57990.2023.00093

M3 - Article in proceedings

AN - SCOPUS:85179281031

T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

SP - 1515

EP - 1550

BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023

PB - IEEE Computer Society Press

T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023

Y2 - 6 November 2023 through 9 November 2023

ER -

ID: 384253969