Confirmation sampling for exact nearest neighbor search

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Confirmation sampling for exact nearest neighbor search. / Christiani, Tobias; Pagh, Rasmus; Thorup, Mikkel.

Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings. red. / Shin’ichi Satoh; Lucia Vadicamo; Fabio Carrara; Arthur Zimek; Ilaria Bartolini; Martin Aumüller; Bjorn Por Jonsson; Rasmus Pagh. Springer, 2020. s. 97-110 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 12440 LNCS).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Christiani, T, Pagh, R & Thorup, M 2020, Confirmation sampling for exact nearest neighbor search. i S Satoh, L Vadicamo, F Carrara, A Zimek, I Bartolini, M Aumüller, BP Jonsson & R Pagh (red), Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings. Springer, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), bind 12440 LNCS, s. 97-110, 13th International Conference on Similarity Search and Applications, SISAP 2020, Copenhagen, Danmark, 30/09/2020. https://doi.org/10.1007/978-3-030-60936-8_8

APA

Christiani, T., Pagh, R., & Thorup, M. (2020). Confirmation sampling for exact nearest neighbor search. I S. Satoh, L. Vadicamo, F. Carrara, A. Zimek, I. Bartolini, M. Aumüller, B. P. Jonsson, & R. Pagh (red.), Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings (s. 97-110). Springer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Bind 12440 LNCS https://doi.org/10.1007/978-3-030-60936-8_8

Vancouver

Christiani T, Pagh R, Thorup M. Confirmation sampling for exact nearest neighbor search. I Satoh S, Vadicamo L, Carrara F, Zimek A, Bartolini I, Aumüller M, Jonsson BP, Pagh R, red., Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings. Springer. 2020. s. 97-110. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 12440 LNCS). https://doi.org/10.1007/978-3-030-60936-8_8

Author

Christiani, Tobias ; Pagh, Rasmus ; Thorup, Mikkel. / Confirmation sampling for exact nearest neighbor search. Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings. red. / Shin’ichi Satoh ; Lucia Vadicamo ; Fabio Carrara ; Arthur Zimek ; Ilaria Bartolini ; Martin Aumüller ; Bjorn Por Jonsson ; Rasmus Pagh. Springer, 2020. s. 97-110 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 12440 LNCS).

Bibtex

@inproceedings{ff4f862b676d45ed8b3f3bae1dea76a1,
title = "Confirmation sampling for exact nearest neighbor search",
abstract = "Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC {\textquoteright}98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability $$1-\delta $$, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of $$\varOmega (L)$$ trees with internal parameters specifically tuned to the query and data.",
keywords = "Locality-sensitive hashing, Nearest neighbor search",
author = "Tobias Christiani and Rasmus Pagh and Mikkel Thorup",
year = "2020",
doi = "10.1007/978-3-030-60936-8_8",
language = "English",
isbn = "9783030609351",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "97--110",
editor = "Shin{\textquoteright}ichi Satoh and Lucia Vadicamo and Fabio Carrara and Arthur Zimek and Ilaria Bartolini and Martin Aum{\"u}ller and Jonsson, {Bjorn Por} and Rasmus Pagh",
booktitle = "Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings",
address = "Switzerland",
note = "13th International Conference on Similarity Search and Applications, SISAP 2020 ; Conference date: 30-09-2020 Through 02-10-2020",

}

RIS

TY - GEN

T1 - Confirmation sampling for exact nearest neighbor search

AU - Christiani, Tobias

AU - Pagh, Rasmus

AU - Thorup, Mikkel

PY - 2020

Y1 - 2020

N2 - Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability $$1-\delta $$, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of $$\varOmega (L)$$ trees with internal parameters specifically tuned to the query and data.

AB - Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability $$1-\delta $$, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of $$\varOmega (L)$$ trees with internal parameters specifically tuned to the query and data.

KW - Locality-sensitive hashing

KW - Nearest neighbor search

U2 - 10.1007/978-3-030-60936-8_8

DO - 10.1007/978-3-030-60936-8_8

M3 - Article in proceedings

AN - SCOPUS:85093858448

SN - 9783030609351

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 97

EP - 110

BT - Similarity Search and Applications - 13th International Conference, SISAP 2020, Proceedings

A2 - Satoh, Shin’ichi

A2 - Vadicamo, Lucia

A2 - Carrara, Fabio

A2 - Zimek, Arthur

A2 - Bartolini, Ilaria

A2 - Aumüller, Martin

A2 - Jonsson, Bjorn Por

A2 - Pagh, Rasmus

PB - Springer

T2 - 13th International Conference on Similarity Search and Applications, SISAP 2020

Y2 - 30 September 2020 through 2 October 2020

ER -

ID: 258500798