Bias Reduction for Sum Estimation

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 832 KB, PDF document

Talya Eden
Jakob Bæk Tejs Houen
Shyam Narayanan
Will Rosenbaum
Tetek, Jakub

In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution P, and that when an element x is sampled, the probability P(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution Q that is sufficiently close to P. This phenomenon raises interesting questions: under what conditions is a “noisy” distribution Q sufficient, and what is the algorithmic cost of coping with this noise? In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x1, . . ., xN. This problem is well-studied in the statistical literature in the case P = Q, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution P, values are sampled from a distribution Q that is pointwise close to P. That is, there is a parameter γ < 1 such that for all xi, (1 − γ)P(i) ≤ Q(i) ≤ (1 + γ)P(i). For every positive integer k we define an estimator ζ_k for µ = ^P_i xi whose bias is proportional to γ^k (where our ζ1 reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if Q is pointwise γ-close to uniform and all xi ∈ {0, 1}, for any ε > 0, we can estimate µ to within additive error εN using m = Θ(N^{1− k1} /ε^2/k) samples, where k = ⌈(lg ε)/(lg γ)⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

Original language	English
Title of host publication	Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023
Editors	Nicole Megow, Adam Smith
Publisher	Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date	Sep 2023
Pages	1-21
Article number	62
ISBN (Electronic)	9783959772969
DOIs	https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2023.62
Publication status	Published - Sep 2023
Event	26th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2023 and the 27th International Conference on Randomization and Computation, RANDOM 2023 - Atlanta, United States Duration: 11 Sep 2023 → 13 Sep 2023

Conference

Conference	26th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2023 and the 27th International Conference on Randomization and Computation, RANDOM 2023
Land	United States
By	Atlanta
Periode	11/09/2023 → 13/09/2023

Series	Leibniz International Proceedings in Informatics, LIPIcs
Volume	275
ISSN	1868-8969

Bibliographical note

Research areas

bias reduction, sample complexity, sublinear time algorithms, sum estimation

ID: 382559688

Department of Computer Science