Unsupervised Evaluation for Question Answering with Transformers

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Unsupervised Evaluation for Question Answering with Transformers
Final published version, 449 KB, PDF document

Lukas Muttenthaler
Augenstein, Isabelle
Johannes Bjerva

It is challenging to automatically evaluate the answer of a QA model at inference time. Although many models provide confidence scores, and simple heuristics can go a long way towards indicating answer correctness, such measures are heavily dataset-dependent and are unlikely to generalise. In this work, we begin by investigating the hidden representations of questions, answers, and contexts in transformer-based QA architectures. We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer span is correct. Our method does not require any labelled data and outperforms strong heuristic baselines, across 2 datasets and 7 domains. We are able to predict whether or not a model’s answer is correct with 91.37% accuracy on SQuAD, and 80.7% accuracy on SubjQA. We expect that this method will have broad applications, e.g., in semi-automatic development of QA datasets.

Original language	English
Title of host publication	Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Publisher	Association for Computational Linguistics
Publication date	2020
Pages	83-90
DOIs	https://doi.org/10.18653/v1/2020.blackboxnlp-1.8
Publication status	Published - 2020
Event	The 2020 Conference on Empirical Methods in Natural Language Processing - online Duration: 16 Nov 2020 → 20 Nov 2020 http://2020.emnlp.org

Conference

Conference	The 2020 Conference on Empirical Methods in Natural Language Processing
Location	online
Periode	16/11/2020 → 20/11/2020
Internetadresse	http://2020.emnlp.org

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 254996871

Department of Computer Science