Dansk Selskab for Datalogi holder foredrag

Similarity Search

Foredrag ved Postdoc Ninh Dang Pham, IT Universitetet


Estimating set similarity is central to many internet search applications, for example to determine whether two web documents are fully or partly identical. We present a hashing-based search technique using odd sketches. Odd sketches are a simple and efficient estimator of the so-called Jaccard similarity between two sets. The Jaccard similarity is the ratio between the number of elements in the intersection vs. the union of the two sets. The talk presents a theoretical analysis of the quality of the odd sketch estimation. The talk is based on a research paper “Efficient esitmation for high similarities using odd sketches”, by Michael Mitzenmacher, Rasmus Pagh and Ninh Pham presented at The 14th World Wide Web Conference in 2014 in Seoul, Korea.