Bias Reduction for Sum Estimation

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Bias Reduction for Sum Estimation. / Eden, Talya; Tejs Houen, Jakob Bæk; Narayanan, Shyam; Rosenbaum, Will; Tětek, Jakub.

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023. ed. / Nicole Megow; Adam Smith. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. p. 1-21 62 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 275).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Eden, T, Tejs Houen, JB, Narayanan, S, Rosenbaum, W & Tětek, J 2023, Bias Reduction for Sum Estimation. in N Megow & A Smith (eds), Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023., 62, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Leibniz International Proceedings in Informatics, LIPIcs, vol. 275, pp. 1-21, 26th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2023 and the 27th International Conference on Randomization and Computation, RANDOM 2023, Atlanta, United States, 11/09/2023. https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2023.62

APA

Eden, T., Tejs Houen, J. B., Narayanan, S., Rosenbaum, W., & Tětek, J. (2023). Bias Reduction for Sum Estimation. In N. Megow, & A. Smith (Eds.), Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023 (pp. 1-21). [62] Schloss Dagstuhl - Leibniz-Zentrum für Informatik. Leibniz International Proceedings in Informatics, LIPIcs Vol. 275 https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2023.62

Vancouver

Eden T, Tejs Houen JB, Narayanan S, Rosenbaum W, Tětek J. Bias Reduction for Sum Estimation. In Megow N, Smith A, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. 2023. p. 1-21. 62. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 275). https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2023.62

Author

Eden, Talya ; Tejs Houen, Jakob Bæk ; Narayanan, Shyam ; Rosenbaum, Will ; Tětek, Jakub. / Bias Reduction for Sum Estimation. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023. editor / Nicole Megow ; Adam Smith. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. pp. 1-21 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 275).

Bibtex

@inproceedings{a3ef05a1f2ad454e8fe116efbd1983fa,

title = "Bias Reduction for Sum Estimation",

abstract = "In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution P, and that when an element x is sampled, the probability P(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution Q that is sufficiently close to P. This phenomenon raises interesting questions: under what conditions is a “noisy” distribution Q sufficient, and what is the algorithmic cost of coping with this noise? In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x1, . . ., xN. This problem is well-studied in the statistical literature in the case P = Q, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution P, values are sampled from a distribution Q that is pointwise close to P. That is, there is a parameter γ < 1 such that for all xi, (1 − γ)P(i) ≤ Q(i) ≤ (1 + γ)P(i). For every positive integer k we define an estimator ζk for µ = Pi xi whose bias is proportional to γk (where our ζ1 reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if Q is pointwise γ-close to uniform and all xi ∈ {0, 1}, for any ε > 0, we can estimate µ to within additive error εN using m = Θ(N1− k1 /ε2/k) samples, where k = ⌈(lg ε)/(lg γ)⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.",

keywords = "bias reduction, sample complexity, sublinear time algorithms, sum estimation",

author = "Talya Eden and {Tejs Houen}, {Jakob B{\ae}k} and Shyam Narayanan and Will Rosenbaum and Jakub T{\v e}tek",

note = "Publisher Copyright: {\textcopyright} 2023 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.; 26th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2023 and the 27th International Conference on Randomization and Computation, RANDOM 2023 ; Conference date: 11-09-2023 Through 13-09-2023",

year = "2023",

month = sep,

doi = "10.4230/LIPIcs.APPROX/RANDOM.2023.62",

language = "English",

series = "Leibniz International Proceedings in Informatics, LIPIcs",

publisher = "Schloss Dagstuhl - Leibniz-Zentrum f{\"u}r Informatik",

pages = "1--21",

editor = "Nicole Megow and Adam Smith",

booktitle = "Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023",

}

RIS

TY - GEN

T1 - Bias Reduction for Sum Estimation

AU - Eden, Talya

AU - Tejs Houen, Jakob Bæk

AU - Narayanan, Shyam

AU - Rosenbaum, Will

AU - Tětek, Jakub

PY - 2023/9

Y1 - 2023/9

N2 - In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution P, and that when an element x is sampled, the probability P(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution Q that is sufficiently close to P. This phenomenon raises interesting questions: under what conditions is a “noisy” distribution Q sufficient, and what is the algorithmic cost of coping with this noise? In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x1, . . ., xN. This problem is well-studied in the statistical literature in the case P = Q, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution P, values are sampled from a distribution Q that is pointwise close to P. That is, there is a parameter γ < 1 such that for all xi, (1 − γ)P(i) ≤ Q(i) ≤ (1 + γ)P(i). For every positive integer k we define an estimator ζk for µ = Pi xi whose bias is proportional to γk (where our ζ1 reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if Q is pointwise γ-close to uniform and all xi ∈ {0, 1}, for any ε > 0, we can estimate µ to within additive error εN using m = Θ(N1− k1 /ε2/k) samples, where k = ⌈(lg ε)/(lg γ)⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

AB - In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution P, and that when an element x is sampled, the probability P(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution Q that is sufficiently close to P. This phenomenon raises interesting questions: under what conditions is a “noisy” distribution Q sufficient, and what is the algorithmic cost of coping with this noise? In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x1, . . ., xN. This problem is well-studied in the statistical literature in the case P = Q, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution P, values are sampled from a distribution Q that is pointwise close to P. That is, there is a parameter γ < 1 such that for all xi, (1 − γ)P(i) ≤ Q(i) ≤ (1 + γ)P(i). For every positive integer k we define an estimator ζk for µ = Pi xi whose bias is proportional to γk (where our ζ1 reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if Q is pointwise γ-close to uniform and all xi ∈ {0, 1}, for any ε > 0, we can estimate µ to within additive error εN using m = Θ(N1− k1 /ε2/k) samples, where k = ⌈(lg ε)/(lg γ)⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

KW - bias reduction

KW - sample complexity

KW - sublinear time algorithms

KW - sum estimation

U2 - 10.4230/LIPIcs.APPROX/RANDOM.2023.62

DO - 10.4230/LIPIcs.APPROX/RANDOM.2023.62

M3 - Article in proceedings

AN - SCOPUS:85171979766

T3 - Leibniz International Proceedings in Informatics, LIPIcs

SP - 1

EP - 21

BT - Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023

A2 - Megow, Nicole

A2 - Smith, Adam

PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik

T2 - 26th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2023 and the 27th International Conference on Randomization and Computation, RANDOM 2023

Y2 - 11 September 2023 through 13 September 2023

ER -

ID: 382559688

Department of Computer Science