Quantifying gender bias towards politicians in cross-lingual language models

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Quantifying gender bias towards politicians in cross-lingual language models. / Stańczak, Karolina; Choudhury, Sagnik Ray; Pimentel, Tiago; Cotterell, Ryan; Augenstein, Isabelle.

In: PLoS ONE, Vol. 18, No. 11 November, e0277640, 2023, p. 1-24.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Stańczak, K, Choudhury, SR, Pimentel, T, Cotterell, R & Augenstein, I 2023, 'Quantifying gender bias towards politicians in cross-lingual language models', PLoS ONE, vol. 18, no. 11 November, e0277640, pp. 1-24. https://doi.org/10.1371/journal.pone.0277640

APA

Stańczak, K., Choudhury, S. R., Pimentel, T., Cotterell, R., & Augenstein, I. (2023). Quantifying gender bias towards politicians in cross-lingual language models. PLoS ONE, 18(11 November), 1-24. [e0277640]. https://doi.org/10.1371/journal.pone.0277640

Vancouver

Stańczak K, Choudhury SR, Pimentel T, Cotterell R, Augenstein I. Quantifying gender bias towards politicians in cross-lingual language models. PLoS ONE. 2023;18(11 November):1-24. e0277640. https://doi.org/10.1371/journal.pone.0277640

Author

Stańczak, Karolina ; Choudhury, Sagnik Ray ; Pimentel, Tiago ; Cotterell, Ryan ; Augenstein, Isabelle. / Quantifying gender bias towards politicians in cross-lingual language models. In: PLoS ONE. 2023 ; Vol. 18, No. 11 November. pp. 1-24.

Bibtex

@article{21a438ff930e4914b7d4544a73982a15,

title = "Quantifying gender bias towards politicians in cross-lingual language models",

abstract = "Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models{\textquoteright} stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.",

author = "Karolina Sta{\'n}czak and Choudhury, {Sagnik Ray} and Tiago Pimentel and Ryan Cotterell and Isabelle Augenstein",

note = "Publisher Copyright: {\textcopyright} 2023 Sta{\'n}czak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2023",

doi = "10.1371/journal.pone.0277640",

language = "English",

volume = "18",

pages = "1--24",

journal = "PLoS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "11 November",

}

RIS

TY - JOUR

T1 - Quantifying gender bias towards politicians in cross-lingual language models

AU - Stańczak, Karolina

AU - Choudhury, Sagnik Ray

AU - Pimentel, Tiago

AU - Cotterell, Ryan

AU - Augenstein, Isabelle

N1 - Publisher Copyright: © 2023 Stańczak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2023

Y1 - 2023

N2 - Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.

AB - Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.

U2 - 10.1371/journal.pone.0277640

DO - 10.1371/journal.pone.0277640

M3 - Journal article

C2 - 38015835

AN - SCOPUS:85178494544

VL - 18

SP - 1

EP - 24

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 11 November

M1 - e0277640

ER -

ID: 377801135

Department of Computer Science