HyperLogLogLog: Cardinality Estimation With One Log More

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

HyperLogLogLog : Cardinality Estimation With One Log More. / Karppa, Matti; Pagh, Rasmus.

KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, Inc., 2022. p. 753-761.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Karppa, M & Pagh, R 2022, HyperLogLogLog: Cardinality Estimation With One Log More. in KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, Inc., pp. 753-761, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022, Washington, United States, 14/08/2022. https://doi.org/10.1145/3534678.3539246

APA

Karppa, M., & Pagh, R. (2022). HyperLogLogLog: Cardinality Estimation With One Log More. In KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 753-761). Association for Computing Machinery, Inc.. https://doi.org/10.1145/3534678.3539246

Vancouver

Karppa M, Pagh R. HyperLogLogLog: Cardinality Estimation With One Log More. In KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, Inc. 2022. p. 753-761 https://doi.org/10.1145/3534678.3539246

Author

Karppa, Matti ; Pagh, Rasmus. / HyperLogLogLog : Cardinality Estimation With One Log More. KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, Inc., 2022. pp. 753-761

Bibtex

@inproceedings{0977c59a89244f899ff859c01d97de6c,
title = "HyperLogLogLog: Cardinality Estimation With One Log More",
abstract = "We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from O(mogog n) bits down to m log2log2log2 m + O(m+loglog n) bits for estimating the number of distinct elements∼n using m∼registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming n is sufficiently larger than m. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to mlog2log2log2log2 m+O(mlogloglog m/loglog m+loglog n) bits. ",
keywords = "cardinality estimation, distinct elements, hashing, hyperloglog",
author = "Matti Karppa and Rasmus Pagh",
note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 ; Conference date: 14-08-2022 Through 18-08-2022",
year = "2022",
doi = "10.1145/3534678.3539246",
language = "English",
pages = "753--761",
booktitle = "KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery, Inc.",

}

RIS

TY - GEN

T1 - HyperLogLogLog

T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022

AU - Karppa, Matti

AU - Pagh, Rasmus

N1 - Publisher Copyright: © 2022 ACM.

PY - 2022

Y1 - 2022

N2 - We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from O(mogog n) bits down to m log2log2log2 m + O(m+loglog n) bits for estimating the number of distinct elements∼n using m∼registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming n is sufficiently larger than m. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to mlog2log2log2log2 m+O(mlogloglog m/loglog m+loglog n) bits.

AB - We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from O(mogog n) bits down to m log2log2log2 m + O(m+loglog n) bits for estimating the number of distinct elements∼n using m∼registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming n is sufficiently larger than m. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to mlog2log2log2log2 m+O(mlogloglog m/loglog m+loglog n) bits.

KW - cardinality estimation

KW - distinct elements

KW - hashing

KW - hyperloglog

UR - http://www.scopus.com/inward/record.url?scp=85136954603&partnerID=8YFLogxK

U2 - 10.1145/3534678.3539246

DO - 10.1145/3534678.3539246

M3 - Article in proceedings

AN - SCOPUS:85136954603

SP - 753

EP - 761

BT - KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery, Inc.

Y2 - 14 August 2022 through 18 August 2022

ER -

ID: 340691957