AWARE: Exploiting evaluation measures to combine multiple assessors
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
AWARE : Exploiting evaluation measures to combine multiple assessors. / Ferrante, Marco; Ferro, Nicola; Maistro, Maria.
In: ACM Transactions on Information Systems, Vol. 36, No. 2, 20, 01.08.2017.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - AWARE
T2 - Exploiting evaluation measures to combine multiple assessors
AU - Ferrante, Marco
AU - Ferro, Nicola
AU - Maistro, Maria
PY - 2017/8/1
Y1 - 2017/8/1
N2 - We propose the Assessor-drivenWeighted Averages for Retrieval Evaluation (AWARE) probabilistic framework, a novel methodology for dealing with multiple crowd assessors that may be contradictory and/or noisy. By modeling relevance judgements and crowd assessors as sources of uncertainty, AWARE takes the expectation of a generic performance measure, like Average Precision, composed with these random variables. In this way, it approaches the problem of aggregating different crowd assessors from a new perspective, that is, directly combining the performance measures computed on the ground truth generated by the crowd assessors instead of adopting some classification technique to merge the labels produced by them. We propose several unsupervised estimators that instantiate the AWARE framework and we compare them with state-of-theart approaches, that is,Majoriity Vote and Expectation Maximization, on TREC collections. We found that AWARE approaches improve in terms of their capability of correctly ranking systems and predicting their actual performance scores.
AB - We propose the Assessor-drivenWeighted Averages for Retrieval Evaluation (AWARE) probabilistic framework, a novel methodology for dealing with multiple crowd assessors that may be contradictory and/or noisy. By modeling relevance judgements and crowd assessors as sources of uncertainty, AWARE takes the expectation of a generic performance measure, like Average Precision, composed with these random variables. In this way, it approaches the problem of aggregating different crowd assessors from a new perspective, that is, directly combining the performance measures computed on the ground truth generated by the crowd assessors instead of adopting some classification technique to merge the labels produced by them. We propose several unsupervised estimators that instantiate the AWARE framework and we compare them with state-of-theart approaches, that is,Majoriity Vote and Expectation Maximization, on TREC collections. We found that AWARE approaches improve in terms of their capability of correctly ranking systems and predicting their actual performance scores.
KW - AWARE
KW - Crowdsourcing
KW - Performance measure
KW - Unsupervised estimators
KW - Weighted average
UR - http://www.scopus.com/inward/record.url?scp=85028669422&partnerID=8YFLogxK
U2 - 10.1145/3110217
DO - 10.1145/3110217
M3 - Journal article
AN - SCOPUS:85028669422
VL - 36
JO - ACM Transactions on Information Systems
JF - ACM Transactions on Information Systems
SN - 1046-8188
IS - 2
M1 - 20
ER -
ID: 216517365