Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming. / Kacham, Praneeth; Pagh, Rasmus; Thorup, Mikkel; Woodruff, David P.
Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023. IEEE Computer Society Press, 2023. p. 1515-1550 (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming
AU - Kacham, Praneeth
AU - Pagh, Rasmus
AU - Thorup, Mikkel
AU - Woodruff, David P.
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example:•Andoni's F_p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of ℓ_p sampling in a turnstile stream. Both of our algorithms use Õ(d1-2/p) bits of space and have O(1) update time.•For 0< p<2, the 1 pm ϵ approximate F_p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(ϵ-2 log d) bits of space but has an update time of O(log 2(1/ϵ) log log (1/ϵ)). Using HashPRG, we show that if 1/√d ≤ ϵ ≤ 1/dc for an arbitrarily small constant c > 0, then we can obtain a 1 pm ϵ approximate F_p estimation algorithm that uses the optimal O(ϵ-2 log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.•CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch.For a d-dimensional vector x being updated in a turnstile stream, we show that ||x||∞ can be estimated up to an additive error of ϵ||x||_2 using O(ϵ-2 log (1/ϵ) log d) bits of space. Additionally, the update time of this algorithm is O(log 1/ϵ) in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with ||x||∞=Θ(||x||_2), we show that the lower bound can be broken by giving an algorithm that uses O(ϵ-2 log d) bits of space which approximates ||x||∞ up to an additive error of ϵ||x||_2. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O(log 1/ϵ) in the Word RAM model.
AB - We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example:•Andoni's F_p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of ℓ_p sampling in a turnstile stream. Both of our algorithms use Õ(d1-2/p) bits of space and have O(1) update time.•For 0< p<2, the 1 pm ϵ approximate F_p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O(ϵ-2 log d) bits of space but has an update time of O(log 2(1/ϵ) log log (1/ϵ)). Using HashPRG, we show that if 1/√d ≤ ϵ ≤ 1/dc for an arbitrarily small constant c > 0, then we can obtain a 1 pm ϵ approximate F_p estimation algorithm that uses the optimal O(ϵ-2 log d) bits of space and has an update time of O(log d) in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.•CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch.For a d-dimensional vector x being updated in a turnstile stream, we show that ||x||∞ can be estimated up to an additive error of ϵ||x||_2 using O(ϵ-2 log (1/ϵ) log d) bits of space. Additionally, the update time of this algorithm is O(log 1/ϵ) in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with ||x||∞=Θ(||x||_2), we show that the lower bound can be broken by giving an algorithm that uses O(ϵ-2 log d) bits of space which approximates ||x||∞ up to an additive error of ϵ||x||_2. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O(log 1/ϵ) in the Word RAM model.
KW - Fast Update time
KW - Pseudorandom Generators
KW - Streaming Algorithms
U2 - 10.1109/FOCS57990.2023.00093
DO - 10.1109/FOCS57990.2023.00093
M3 - Article in proceedings
AN - SCOPUS:85179281031
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 1515
EP - 1550
BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023
PB - IEEE Computer Society Press
T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023
Y2 - 6 November 2023 through 9 November 2023
ER -
ID: 384253969