Simple Set Sketching

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Simple Set Sketching. / Bæk Tejs Houen, Jakob; Pagh, Rasmus; Walzer, Stefan.

Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA). ed. / Telikepalli Kavitha; Kurt Mehlhorn. Society for Industrial and Applied Mathematics, 2023. p. 228-241.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Bæk Tejs Houen, J, Pagh, R & Walzer, S 2023, Simple Set Sketching. in T Kavitha & K Mehlhorn (eds), Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA). Society for Industrial and Applied Mathematics, pp. 228-241, 2023 Symposium on Simplicity in Algorithms (SOSA), Florence, Italy, 23/01/2023. https://doi.org/10.1137/1.9781611977585.ch21

APA

Bæk Tejs Houen, J., Pagh, R., & Walzer, S. (2023). Simple Set Sketching. In T. Kavitha, & K. Mehlhorn (Eds.), Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA) (pp. 228-241). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611977585.ch21

Vancouver

Bæk Tejs Houen J, Pagh R, Walzer S. Simple Set Sketching. In Kavitha T, Mehlhorn K, editors, Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA). Society for Industrial and Applied Mathematics. 2023. p. 228-241 https://doi.org/10.1137/1.9781611977585.ch21

Author

Bæk Tejs Houen, Jakob ; Pagh, Rasmus ; Walzer, Stefan. / Simple Set Sketching. Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA). editor / Telikepalli Kavitha ; Kurt Mehlhorn. Society for Industrial and Applied Mathematics, 2023. pp. 228-241

Bibtex

@inproceedings{2d8d220d18024f51bba1a60b75ca8398,
title = "Simple Set Sketching",
abstract = "Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For an keys and n buckets, where α is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions.We show that if this collision resolution strategy is repeated three times independently the situation reverses: If α is below a threshold of ≈ 0.81 then we can recover the set of all inserted keys in linear time with high probability.Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.",
author = "{B{\ae}k Tejs Houen}, Jakob and Rasmus Pagh and Stefan Walzer",
year = "2023",
doi = "10.1137/1.9781611977585.ch21",
language = "English",
pages = "228--241",
editor = "Telikepalli Kavitha and Kurt Mehlhorn",
booktitle = "Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA)",
publisher = "Society for Industrial and Applied Mathematics",
address = "United States",
note = "2023 Symposium on Simplicity in Algorithms (SOSA) ; Conference date: 23-01-2023 Through 25-01-2023",

}

RIS

TY - GEN

T1 - Simple Set Sketching

AU - Bæk Tejs Houen, Jakob

AU - Pagh, Rasmus

AU - Walzer, Stefan

PY - 2023

Y1 - 2023

N2 - Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For an keys and n buckets, where α is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions.We show that if this collision resolution strategy is repeated three times independently the situation reverses: If α is below a threshold of ≈ 0.81 then we can recover the set of all inserted keys in linear time with high probability.Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.

AB - Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For an keys and n buckets, where α is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions.We show that if this collision resolution strategy is repeated three times independently the situation reverses: If α is below a threshold of ≈ 0.81 then we can recover the set of all inserted keys in linear time with high probability.Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.

U2 - 10.1137/1.9781611977585.ch21

DO - 10.1137/1.9781611977585.ch21

M3 - Article in proceedings

SP - 228

EP - 241

BT - Proceedings, 2023 Symposium on Simplicity in Algorithms (SOSA)

A2 - Kavitha, Telikepalli

A2 - Mehlhorn, Kurt

PB - Society for Industrial and Applied Mathematics

T2 - 2023 Symposium on Simplicity in Algorithms (SOSA)

Y2 - 23 January 2023 through 25 January 2023

ER -

ID: 382689570