Adaptive distributional extensions to DFR ranking

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Casper Petersen
Simonsen, Jakob Grue
Kalervo Järvelin
Lioma, Christina

Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).

Originalsprog	Engelsk
Titel	Proceedings of the 25th ACM International Conference on Information and Knowledge Management
Antal sider	4
Forlag	Association for Computing Machinery
Publikationsdato	2016
Sider	2005-2008
ISBN (Elektronisk)	978-1-4503-4073-1
DOI	https://doi.org/10.1145/2983323.2983895
Status	Udgivet - 2016
Begivenhed	25th ACM International Conference on Information and Knowledge Management - Indianapolis, USA Varighed: 24 okt. 2016 → 28 okt. 2016 Konferencens nummer: 25

Konference

Konference	25th ACM International Conference on Information and Knowledge Management
Nummer	25
Land	USA
By	Indianapolis
Periode	24/10/2016 → 28/10/2016

Forskningsområder

cs.IR

ID: 167474998

Datalogisk Institut