Efficient differentially private F0 linear sketching

Efficient differentially private F₀linear sketching

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Efficient differentially private F₀linear sketching. / Pagh, Rasmus; Stausholm, Nina Mesing.

24th International Conference on Database Theory, ICDT 2021. ed. / Ke Yi; Zhewei Wei. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2021. p. 1-19 18 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 186).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Pagh, R & Stausholm, NM 2021, Efficient differentially private F₀linear sketching. in K Yi & Z Wei (eds), 24th International Conference on Database Theory, ICDT 2021., 18, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Leibniz International Proceedings in Informatics, LIPIcs, vol. 186, pp. 1-19, 24th International Conference on Database Theory, ICDT 2021, Nicosia, Cyprus, 23/03/2021. https://doi.org/10.4230/LIPIcs.ICDT.2021.18

APA

Pagh, R., & Stausholm, N. M. (2021). Efficient differentially private F₀linear sketching. In K. Yi, & Z. Wei (Eds.), 24th International Conference on Database Theory, ICDT 2021 (pp. 1-19). [18] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. Leibniz International Proceedings in Informatics, LIPIcs Vol. 186 https://doi.org/10.4230/LIPIcs.ICDT.2021.18

Vancouver

Pagh R, Stausholm NM. Efficient differentially private F₀linear sketching. In Yi K, Wei Z, editors, 24th International Conference on Database Theory, ICDT 2021. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2021. p. 1-19. 18. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 186). https://doi.org/10.4230/LIPIcs.ICDT.2021.18

Author

Pagh, Rasmus ; Stausholm, Nina Mesing. / Efficient differentially private F₀linear sketching. 24th International Conference on Database Theory, ICDT 2021. editor / Ke Yi ; Zhewei Wei. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2021. pp. 1-19 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 186).

Bibtex

@inproceedings{b7db7829860a43529a5f34bdfac18a68,

title = "Efficient differentially private F0 linear sketching",

abstract = "A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F0, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ϵ (0, 1]u be given. For x ϵ {0, 1}u we are interested in estimating ||x o w||1 where o is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ϵ {0, 1}u to Hx ϵ {0, 1}t for a matrix H sampled from a suitable distribution H. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ϵ {0, 1}t where Pr[(f)j = 1] = p independently for each entry j, we consider the noisy sketch Hx + f, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ϵ = O(1) there exists p < 1/2 and a distribution H of linear sketches of size t = O(log2(u)ϵ-2β-2) such that: 1. For random H ~ H and noise vector f, given Hx + f we can compute an estimate of ||x o w||1 that is accurate within a factor 1 ± β, plus additive error O(log(u)ϵ-2β-2), w. p. 1 - u-1, and 2. For every H ~ H, Hx + f is e-differentially private over the randomness in f. The special case w = (1, . . ., 1) is unweighted F0. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F0, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed e the size of our sketch is polynomially related to the lower bound of Ω(log(u)β-2)bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of O(1/ϵ) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x1 +x2)+(f1 +f2), which allows us to estimate ||(x1 + x2) || w||1. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x1 and x2. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√m), where m is an upper bound on ||x1||0 and ||x2||0. This improves previous work when β = o (1/√m and log(u)/ϵ = mo(1). In conclusion our results both improve the efficiency of existing methods for unweighted F0 estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.",

keywords = "Differential Privacy, Linear Sketches, Weighted F0 Estimation",

author = "Rasmus Pagh and Stausholm, {Nina Mesing}",

year = "2021",

doi = "10.4230/LIPIcs.ICDT.2021.18",

language = "English",

series = "Leibniz International Proceedings in Informatics, LIPIcs",

publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

pages = "1--19",

editor = "Ke Yi and Zhewei Wei",

booktitle = "24th International Conference on Database Theory, ICDT 2021",

note = "24th International Conference on Database Theory, ICDT 2021 ; Conference date: 23-03-2021 Through 26-03-2021",

}

RIS

TY - GEN

T1 - Efficient differentially private F0 linear sketching

AU - Pagh, Rasmus

AU - Stausholm, Nina Mesing

PY - 2021

Y1 - 2021

N2 - A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F0, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ϵ (0, 1]u be given. For x ϵ {0, 1}u we are interested in estimating ||x o w||1 where o is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ϵ {0, 1}u to Hx ϵ {0, 1}t for a matrix H sampled from a suitable distribution H. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ϵ {0, 1}t where Pr[(f)j = 1] = p independently for each entry j, we consider the noisy sketch Hx + f, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ϵ = O(1) there exists p < 1/2 and a distribution H of linear sketches of size t = O(log2(u)ϵ-2β-2) such that: 1. For random H ~ H and noise vector f, given Hx + f we can compute an estimate of ||x o w||1 that is accurate within a factor 1 ± β, plus additive error O(log(u)ϵ-2β-2), w. p. 1 - u-1, and 2. For every H ~ H, Hx + f is e-differentially private over the randomness in f. The special case w = (1, . . ., 1) is unweighted F0. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F0, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed e the size of our sketch is polynomially related to the lower bound of Ω(log(u)β-2)bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of O(1/ϵ) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x1 +x2)+(f1 +f2), which allows us to estimate ||(x1 + x2) || w||1. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x1 and x2. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√m), where m is an upper bound on ||x1||0 and ||x2||0. This improves previous work when β = o (1/√m and log(u)/ϵ = mo(1). In conclusion our results both improve the efficiency of existing methods for unweighted F0 estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

AB - A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F0, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ϵ (0, 1]u be given. For x ϵ {0, 1}u we are interested in estimating ||x o w||1 where o is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ϵ {0, 1}u to Hx ϵ {0, 1}t for a matrix H sampled from a suitable distribution H. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ϵ {0, 1}t where Pr[(f)j = 1] = p independently for each entry j, we consider the noisy sketch Hx + f, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ϵ = O(1) there exists p < 1/2 and a distribution H of linear sketches of size t = O(log2(u)ϵ-2β-2) such that: 1. For random H ~ H and noise vector f, given Hx + f we can compute an estimate of ||x o w||1 that is accurate within a factor 1 ± β, plus additive error O(log(u)ϵ-2β-2), w. p. 1 - u-1, and 2. For every H ~ H, Hx + f is e-differentially private over the randomness in f. The special case w = (1, . . ., 1) is unweighted F0. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F0, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed e the size of our sketch is polynomially related to the lower bound of Ω(log(u)β-2)bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of O(1/ϵ) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x1 +x2)+(f1 +f2), which allows us to estimate ||(x1 + x2) || w||1. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x1 and x2. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√m), where m is an upper bound on ||x1||0 and ||x2||0. This improves previous work when β = o (1/√m and log(u)/ϵ = mo(1). In conclusion our results both improve the efficiency of existing methods for unweighted F0 estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

KW - Differential Privacy

KW - Linear Sketches

KW - Weighted F0 Estimation

UR - http://www.scopus.com/inward/record.url?scp=85115270312&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ICDT.2021.18

DO - 10.4230/LIPIcs.ICDT.2021.18

M3 - Article in proceedings

AN - SCOPUS:85115270312

T3 - Leibniz International Proceedings in Informatics, LIPIcs

SP - 1

EP - 19

BT - 24th International Conference on Database Theory, ICDT 2021

A2 - Yi, Ke

A2 - Wei, Zhewei

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 24th International Conference on Database Theory, ICDT 2021

Y2 - 23 March 2021 through 26 March 2021

ER -

ID: 301142097

Department of Computer Science