Simple Set Sketching

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Documents

  • Fulltext

    Accepted author manuscript, 687 KB, PDF document

Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For an keys and n buckets, where α is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions.
We show that if this collision resolution strategy is repeated three times independently the situation reverses: If α is below a threshold of ≈ 0.81 then we can recover the set of all inserted keys in linear time with high probability.
Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.
Original languageEnglish
Title of host publicationProceedings, 2023 Symposium on Simplicity in Algorithms (SOSA)
EditorsTelikepalli Kavitha, Kurt Mehlhorn
PublisherSociety for Industrial and Applied Mathematics
Publication date2023
Pages228-241
ISBN (Electronic)978-1-61197-758-5
DOIs
Publication statusPublished - 2023
Event2023 Symposium on Simplicity in Algorithms (SOSA) - Florence, Italy
Duration: 23 Jan 202325 Jan 2023

Conference

Conference2023 Symposium on Simplicity in Algorithms (SOSA)
LandItaly
ByFlorence
Periode23/01/202325/01/2023

ID: 382689570