QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

QA Dataset Explosion : A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. / Rogers, Anna; Gardner, Matt; Augenstein, Isabelle.

In: ACM Computing Surveys, Vol. 55, No. 10, 197, 2023.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Rogers, A, Gardner, M & Augenstein, I 2023, 'QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension', ACM Computing Surveys, vol. 55, no. 10, 197. https://doi.org/10.1145/3560260

APA

Rogers, A., Gardner, M., & Augenstein, I. (2023). QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. ACM Computing Surveys, 55(10), [197]. https://doi.org/10.1145/3560260

Vancouver

Rogers A, Gardner M, Augenstein I. QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. ACM Computing Surveys. 2023;55(10). 197. https://doi.org/10.1145/3560260

Author

Rogers, Anna ; Gardner, Matt ; Augenstein, Isabelle. / QA Dataset Explosion : A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. In: ACM Computing Surveys. 2023 ; Vol. 55, No. 10.

Bibtex

@article{7493e79b41fe44b394eb09cc9f9c288a,
title = "QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension",
abstract = "Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with more than 80 new datasets appearing in the past 2 years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of {"}skills{"}that question answering/reading comprehension systems are supposed to acquire and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of overfocusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data and at researchers working on new resources.",
keywords = "natural language understanding, Reading comprehension",
author = "Anna Rogers and Matt Gardner and Isabelle Augenstein",
note = "Publisher Copyright: {\textcopyright} 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.",
year = "2023",
doi = "10.1145/3560260",
language = "English",
volume = "55",
journal = "ACM Computing Surveys",
issn = "0360-0300",
publisher = "Association for Computing Machinery, Inc.",
number = "10",

}

RIS

TY - JOUR

T1 - QA Dataset Explosion

T2 - A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

AU - Rogers, Anna

AU - Gardner, Matt

AU - Augenstein, Isabelle

N1 - Publisher Copyright: © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.

PY - 2023

Y1 - 2023

N2 - Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with more than 80 new datasets appearing in the past 2 years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills"that question answering/reading comprehension systems are supposed to acquire and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of overfocusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data and at researchers working on new resources.

AB - Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with more than 80 new datasets appearing in the past 2 years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills"that question answering/reading comprehension systems are supposed to acquire and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of overfocusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data and at researchers working on new resources.

KW - natural language understanding

KW - Reading comprehension

U2 - 10.1145/3560260

DO - 10.1145/3560260

M3 - Journal article

AN - SCOPUS:85147798618

VL - 55

JO - ACM Computing Surveys

JF - ACM Computing Surveys

SN - 0360-0300

IS - 10

M1 - 197

ER -

ID: 337589133