Measuring Intersectional Biases in Historical Documents

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Measuring Intersectional Biases in Historical Documents. / Borenstein, Nadav; Stanczak, Karolina Ewa; Rolskov, Thea; da Silva Perez, Natália; Klein Kafer, Natacha; Augenstein, Isabelle.

Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Vol. ACL 2023 Association for Computational Linguistics (ACL), 2023. p. 2711–2730.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Borenstein, N, Stanczak, KE, Rolskov, T, da Silva Perez, N, Klein Kafer, N & Augenstein, I 2023, Measuring Intersectional Biases in Historical Documents. in Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. vol. ACL 2023, Association for Computational Linguistics (ACL), pp. 2711–2730, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 09/07/2023. <https://aclanthology.org/2023.findings-acl.170/>

APA

Borenstein, N., Stanczak, K. E., Rolskov, T., da Silva Perez, N., Klein Kafer, N., & Augenstein, I. (2023). Measuring Intersectional Biases in Historical Documents. In Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 (Vol. ACL 2023, pp. 2711–2730). Association for Computational Linguistics (ACL). https://aclanthology.org/2023.findings-acl.170/

Vancouver

Borenstein N, Stanczak KE, Rolskov T, da Silva Perez N, Klein Kafer N, Augenstein I. Measuring Intersectional Biases in Historical Documents. In Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Vol. ACL 2023. Association for Computational Linguistics (ACL). 2023. p. 2711–2730

Author

Borenstein, Nadav ; Stanczak, Karolina Ewa ; Rolskov, Thea ; da Silva Perez, Natália ; Klein Kafer, Natacha ; Augenstein, Isabelle. / Measuring Intersectional Biases in Historical Documents. Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Vol. ACL 2023 Association for Computational Linguistics (ACL), 2023. pp. 2711–2730

Bibtex

@inproceedings{f1d9795c188a46c996c3eddc13f322bf,
title = "Measuring Intersectional Biases in Historical Documents",
abstract = "Data-driven analyses of biases in historicaltexts can help illuminate the origin and development of biases prevailing in modern society.However, digitised historical documents posea challenge for NLP practitioners as these corpora suffer from errors introduced by opticalcharacter recognition (OCR) and are writtenin an archaic language. In this paper, we investigate the continuities and transformationsof bias in historical newspapers published inthe Caribbean during the colonial era (18th to19th centuries). Our analyses are performedalong the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measurethe development of lexical associations usingdistributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCRgenerated data and assess their stability whentrained on and applied to the noisy historicalnewspapers. We find that there is a trade-off between the stability of the word embeddings andtheir compatibility with the historical dataset.We provide evidence that gender and racialbiases are interdependent, and their intersection triggers distinct effects. These findingsalign with the theory of intersectionality, whichstresses that biases affecting people with multiple marginalised identities compound to morethan the sum of their constituents.",
author = "Nadav Borenstein and Stanczak, {Karolina Ewa} and Thea Rolskov and {da Silva Perez}, Nat{\'a}lia and {Klein Kafer}, Natacha and Isabelle Augenstein",
year = "2023",
language = "English",
volume = "ACL 2023",
pages = "2711–2730",
booktitle = "Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",
note = "61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",

}

RIS

TY - GEN

T1 - Measuring Intersectional Biases in Historical Documents

AU - Borenstein, Nadav

AU - Stanczak, Karolina Ewa

AU - Rolskov, Thea

AU - da Silva Perez, Natália

AU - Klein Kafer, Natacha

AU - Augenstein, Isabelle

PY - 2023

Y1 - 2023

N2 - Data-driven analyses of biases in historicaltexts can help illuminate the origin and development of biases prevailing in modern society.However, digitised historical documents posea challenge for NLP practitioners as these corpora suffer from errors introduced by opticalcharacter recognition (OCR) and are writtenin an archaic language. In this paper, we investigate the continuities and transformationsof bias in historical newspapers published inthe Caribbean during the colonial era (18th to19th centuries). Our analyses are performedalong the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measurethe development of lexical associations usingdistributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCRgenerated data and assess their stability whentrained on and applied to the noisy historicalnewspapers. We find that there is a trade-off between the stability of the word embeddings andtheir compatibility with the historical dataset.We provide evidence that gender and racialbiases are interdependent, and their intersection triggers distinct effects. These findingsalign with the theory of intersectionality, whichstresses that biases affecting people with multiple marginalised identities compound to morethan the sum of their constituents.

AB - Data-driven analyses of biases in historicaltexts can help illuminate the origin and development of biases prevailing in modern society.However, digitised historical documents posea challenge for NLP practitioners as these corpora suffer from errors introduced by opticalcharacter recognition (OCR) and are writtenin an archaic language. In this paper, we investigate the continuities and transformationsof bias in historical newspapers published inthe Caribbean during the colonial era (18th to19th centuries). Our analyses are performedalong the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measurethe development of lexical associations usingdistributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCRgenerated data and assess their stability whentrained on and applied to the noisy historicalnewspapers. We find that there is a trade-off between the stability of the word embeddings andtheir compatibility with the historical dataset.We provide evidence that gender and racialbiases are interdependent, and their intersection triggers distinct effects. These findingsalign with the theory of intersectionality, whichstresses that biases affecting people with multiple marginalised identities compound to morethan the sum of their constituents.

M3 - Article in proceedings

VL - ACL 2023

SP - 2711

EP - 2730

BT - Findings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

PB - Association for Computational Linguistics (ACL)

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

Y2 - 9 July 2023 through 14 July 2023

ER -

ID: 375982123