DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

DEANN : Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search. / Karppa, Matti; Aumüller, Martin; Pagh, Rasmus.

Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. PMLR, 2022. p. 3108-3137 (Proceedings of Machine Learning Research, Vol. 151).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Karppa, M, Aumüller, M & Pagh, R 2022, DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search. in Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. PMLR, Proceedings of Machine Learning Research, vol. 151, pp. 3108-3137, 25th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtuel, Unknown, 28/03/2022. <https://proceedings.mlr.press/v151/karppa22a.html>

APA

Karppa, M., Aumüller, M., & Pagh, R. (2022). DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (pp. 3108-3137). PMLR. Proceedings of Machine Learning Research Vol. 151 https://proceedings.mlr.press/v151/karppa22a.html

Vancouver

Karppa M, Aumüller M, Pagh R. DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. PMLR. 2022. p. 3108-3137. (Proceedings of Machine Learning Research, Vol. 151).

Author

Karppa, Matti ; Aumüller, Martin ; Pagh, Rasmus. / DEANN : Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. PMLR, 2022. pp. 3108-3137 (Proceedings of Machine Learning Research, Vol. 151).

Bibtex

@inproceedings{d7a0e5c0b1b04a09803b2c81c6b519ad,
title = "DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search",
abstract = "Kernel Density Estimation (KDE) is a nonparametric method for estimatig the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.",
author = "Matti Karppa and Martin Aum{\"u}ller and Rasmus Pagh",
year = "2022",
language = "English",
series = "Proceedings of Machine Learning Research",
pages = "3108--3137",
booktitle = "Proceedings of the 25th International Conference on Artificial Intelligence and Statistics",
publisher = "PMLR",
note = "25th International Conference on Artificial Intelligence and Statistics (AISTATS) ; Conference date: 28-03-2022 Through 30-03-2022",

}

RIS

TY - GEN

T1 - DEANN

T2 - 25th International Conference on Artificial Intelligence and Statistics (AISTATS)

AU - Karppa, Matti

AU - Aumüller, Martin

AU - Pagh, Rasmus

PY - 2022

Y1 - 2022

N2 - Kernel Density Estimation (KDE) is a nonparametric method for estimatig the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.

AB - Kernel Density Estimation (KDE) is a nonparametric method for estimatig the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.

M3 - Article in proceedings

T3 - Proceedings of Machine Learning Research

SP - 3108

EP - 3137

BT - Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

PB - PMLR

Y2 - 28 March 2022 through 30 March 2022

ER -

ID: 340695306