Measuring Intersectional Biases in Historical Documents

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 1,55 MB, PDF-dokument

Data-driven analyses of biases in historicaltexts can help illuminate the origin and development of biases prevailing in modern society.However, digitised historical documents posea challenge for NLP practitioners as these corpora suffer from errors introduced by opticalcharacter recognition (OCR) and are writtenin an archaic language. In this paper, we investigate the continuities and transformationsof bias in historical newspapers published inthe Caribbean during the colonial era (18th to19th centuries). Our analyses are performedalong the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measurethe development of lexical associations usingdistributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCRgenerated data and assess their stability whentrained on and applied to the noisy historicalnewspapers. We find that there is a trade-off between the stability of the word embeddings andtheir compatibility with the historical dataset.We provide evidence that gender and racialbiases are interdependent, and their intersection triggers distinct effects. These findingsalign with the theory of intersectionality, whichstresses that biases affecting people with multiple marginalised identities compound to morethan the sum of their constituents.
OriginalsprogEngelsk
TitelFindings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Vol/bindACL 2023
ForlagAssociation for Computational Linguistics (ACL)
Publikationsdato2023
Sider2711–2730
StatusUdgivet - 2023
Begivenhed61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Varighed: 9 jul. 202314 jul. 2023

Konference

Konference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
LandCanada
ByToronto
Periode09/07/202314/07/2023
SponsorBloomberg Engineering, et al., Google Research, Liveperson, Meta, Microsoft

ID: 375982123