Sociolectal Analysis of Pretrained Language Models
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- Fulltext
Forlagets udgivne version, 380 KB, PDF-dokument
Using data from English cloze tests, in which subjects also self-reported their gender, age, education, and race, we examine performance differences of pretrained language models across demographic groups, defined by these (protected) attributes. We demonstrate wide performance gaps across demographic groups and show that pretrained language models systematically disfavor young non-white male speakers; i.e., not only do pretrained language models learn social biases (stereotypical associations) – pretrained language models also learn sociolectal biases, learning to speak more like some than like others. We show, however, that, with the exception of BERT models, larger pretrained language models reduce some the performance gaps between majority and minority groups.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2021 |
Sider | 4581–4588 |
DOI | |
Status | Udgivet - 2021 |
Begivenhed | 2021 Conference on Empirical Methods in Natural Language Processing - Varighed: 7 nov. 2021 → 11 nov. 2021 |
Konference
Konference | 2021 Conference on Empirical Methods in Natural Language Processing |
---|---|
Periode | 07/11/2021 → 11/11/2021 |
ID: 299822479