How to Measure the Reproducibility of System-oriented IR Experiments
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
How to Measure the Reproducibility of System-oriented IR Experiments. / Breuer, Timo; Ferro, Nicola; Fuhr, Norbert; Maistro, Maria; Sakai, Tetsuya; Schaer, Philipp; Soboroff, Ian.
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2020. p. 349-358.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - How to Measure the Reproducibility of System-oriented IR Experiments
AU - Breuer, Timo
AU - Ferro, Nicola
AU - Fuhr, Norbert
AU - Maistro, Maria
AU - Sakai, Tetsuya
AU - Schaer, Philipp
AU - Soboroff, Ian
N1 - Publisher Copyright: © 2020 ACM.
PY - 2020
Y1 - 2020
N2 - Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.
AB - Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.
KW - measure
KW - replicability
KW - reproducibility
U2 - 10.1145/3397271.3401036
DO - 10.1145/3397271.3401036
M3 - Article in proceedings
AN - SCOPUS:85090158838
SP - 349
EP - 358
BT - SIGIR '20
PB - Association for Computing Machinery
T2 - 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
Y2 - 25 July 2020 through 30 July 2020
ER -
ID: 269912561