DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • DEANN

    Forlagets udgivne version, 595 KB, PDF-dokument

Kernel Density Estimation (KDE) is a nonparametric method for estimatig the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.
OriginalsprogEngelsk
TitelProceedings of the 25th International Conference on Artificial Intelligence and Statistics
ForlagPMLR
Publikationsdato2022
Sider3108-3137
StatusUdgivet - 2022
Begivenhed25th International Conference on Artificial Intelligence and Statistics (AISTATS) - Virtuel, Ukendt
Varighed: 28 mar. 202230 mar. 2022

Udstilling

Udstilling25th International Conference on Artificial Intelligence and Statistics (AISTATS)
LandUkendt
ByVirtuel
Periode28/03/202230/03/2022
NavnProceedings of Machine Learning Research
Vol/bind151
ISSN2640-3498

ID: 340695306