On the Estimation and Use of Statistical Modelling in Information Retrieval

Research output: Book/ReportPh.D. thesisResearch

Standard

On the Estimation and Use of Statistical Modelling in Information Retrieval. / Petersen, Casper.

Department of Computer Science, Faculty of Science, University of Copenhagen, 2016.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Petersen, C 2016, On the Estimation and Use of Statistical Modelling in Information Retrieval. Department of Computer Science, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/1f0go08/cdi_arxiv_primary_1904_00289>

APA

Petersen, C. (2016). On the Estimation and Use of Statistical Modelling in Information Retrieval. Department of Computer Science, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/1f0go08/cdi_arxiv_primary_1904_00289

Vancouver

Petersen C. On the Estimation and Use of Statistical Modelling in Information Retrieval. Department of Computer Science, Faculty of Science, University of Copenhagen, 2016.

Author

Petersen, Casper. / On the Estimation and Use of Statistical Modelling in Information Retrieval. Department of Computer Science, Faculty of Science, University of Copenhagen, 2016.

Bibtex

@phdthesis{137900ea11cb4c0ea15e7205f66e9876,
title = "On the Estimation and Use of Statistical Modelling in Information Retrieval",
abstract = "Automatic text processing often relies on assumptions about the distribution of some property (such as term frequency) in the data being processed. In information retrieval (IR) such assumptions may be contributed to (i) the absence of principled approaches for determining the correct statistical distribution, and to the fact that (ii) making such assumptions does not seem to impact IR effectiveness. However, if such assumptions are not validated, any subsequent calculations, deductions or modelling becomes less accurate for the task at hand. To remove the need for such assumptions, this thesis first introduces a statistically principled method for selecting the best fitting distribution. The thesis then demonstrates that integrating knowledge about the best-fitting distribution into IR leads to superior results compared to existing strong baselines on multiple datasets. Overall, this thesis concludes that assumptions regarding the distribution of dataset properties can be replaced with an effective, efficient and principled method for determining the best-fitting distribution and that using this distribution can lead to improved retrieval performance.",
author = "Casper Petersen",
year = "2016",
language = "English",
publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - On the Estimation and Use of Statistical Modelling in Information Retrieval

AU - Petersen, Casper

PY - 2016

Y1 - 2016

N2 - Automatic text processing often relies on assumptions about the distribution of some property (such as term frequency) in the data being processed. In information retrieval (IR) such assumptions may be contributed to (i) the absence of principled approaches for determining the correct statistical distribution, and to the fact that (ii) making such assumptions does not seem to impact IR effectiveness. However, if such assumptions are not validated, any subsequent calculations, deductions or modelling becomes less accurate for the task at hand. To remove the need for such assumptions, this thesis first introduces a statistically principled method for selecting the best fitting distribution. The thesis then demonstrates that integrating knowledge about the best-fitting distribution into IR leads to superior results compared to existing strong baselines on multiple datasets. Overall, this thesis concludes that assumptions regarding the distribution of dataset properties can be replaced with an effective, efficient and principled method for determining the best-fitting distribution and that using this distribution can lead to improved retrieval performance.

AB - Automatic text processing often relies on assumptions about the distribution of some property (such as term frequency) in the data being processed. In information retrieval (IR) such assumptions may be contributed to (i) the absence of principled approaches for determining the correct statistical distribution, and to the fact that (ii) making such assumptions does not seem to impact IR effectiveness. However, if such assumptions are not validated, any subsequent calculations, deductions or modelling becomes less accurate for the task at hand. To remove the need for such assumptions, this thesis first introduces a statistically principled method for selecting the best fitting distribution. The thesis then demonstrates that integrating knowledge about the best-fitting distribution into IR leads to superior results compared to existing strong baselines on multiple datasets. Overall, this thesis concludes that assumptions regarding the distribution of dataset properties can be replaced with an effective, efficient and principled method for determining the best-fitting distribution and that using this distribution can lead to improved retrieval performance.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/1f0go08/cdi_arxiv_primary_1904_00289

M3 - Ph.D. thesis

BT - On the Estimation and Use of Statistical Modelling in Information Retrieval

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

ID: 172264131