Unsupervised Evaluation for Question Answering with Transformers

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Unsupervised Evaluation for Question Answering with Transformers
Forlagets udgivne version, 449 KB, PDF-dokument

Lukas Muttenthaler
Augenstein, Isabelle
Johannes Bjerva

It is challenging to automatically evaluate the answer of a QA model at inference time. Although many models provide confidence scores, and simple heuristics can go a long way towards indicating answer correctness, such measures are heavily dataset-dependent and are unlikely to generalise. In this work, we begin by investigating the hidden representations of questions, answers, and contexts in transformer-based QA architectures. We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer span is correct. Our method does not require any labelled data and outperforms strong heuristic baselines, across 2 datasets and 7 domains. We are able to predict whether or not a model’s answer is correct with 91.37% accuracy on SQuAD, and 80.7% accuracy on SubjQA. We expect that this method will have broad applications, e.g., in semi-automatic development of QA datasets.

Originalsprog	Engelsk
Titel	Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Forlag	Association for Computational Linguistics
Publikationsdato	2020
Sider	83-90
DOI	https://doi.org/10.18653/v1/2020.blackboxnlp-1.8
Status	Udgivet - 2020
Begivenhed	The 2020 Conference on Empirical Methods in Natural Language Processing - online Varighed: 16 nov. 2020 → 20 nov. 2020 http://2020.emnlp.org

Konference

Konference	The 2020 Conference on Empirical Methods in Natural Language Processing
Lokation	online
Periode	16/11/2020 → 20/11/2020
Internetadresse	http://2020.emnlp.org

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 254996871

Datalogisk Institut