Counterfactually Augmented Data and Unintended Bias

Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Counterfactually Augmented Data and Unintended Bias : The Case of Sexism and Hate Speech Detection. / Sen, Indira; Samory, Mattia; Wagner, Claudia; Augenstein, Isabelle.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (ACL), 2022. p. 4716-4726.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Sen, I, Samory, M, Wagner, C & Augenstein, I 2022, Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection. in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (ACL), pp. 4716-4726, 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, United States, 10/07/2022. https://doi.org/10.18653/v1/2022.naacl-main.347

APA

Sen, I., Samory, M., Wagner, C., & Augenstein, I. (2022). Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4716-4726). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-main.347

Vancouver

Sen I, Samory M, Wagner C, Augenstein I. Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (ACL). 2022. p. 4716-4726 https://doi.org/10.18653/v1/2022.naacl-main.347

Author

Sen, Indira ; Samory, Mattia ; Wagner, Claudia ; Augenstein, Isabelle. / Counterfactually Augmented Data and Unintended Bias : The Case of Sexism and Hate Speech Detection. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (ACL), 2022. pp. 4716-4726

Bibtex

@inproceedings{85581341a933494c8390b0bb1a7aeaee,

title = "Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection",

abstract = "Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited to promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD-perturbations of core features-may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and nonsexist usage of identity and gendered terms. On these hard cases, models trained on CAD, especially construct-driven CAD, show higher false positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD-construct-driven and construct-agnostic-reduces such unintended bias.",

author = "Indira Sen and Mattia Samory and Claudia Wagner and Isabelle Augenstein",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 ; Conference date: 10-07-2022 Through 15-07-2022",

year = "2022",

doi = "10.18653/v1/2022.naacl-main.347",

language = "English",

pages = "4716--4726",

booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - Counterfactually Augmented Data and Unintended Bias

T2 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022

AU - Sen, Indira

AU - Samory, Mattia

AU - Wagner, Claudia

AU - Augenstein, Isabelle

PY - 2022

Y1 - 2022

N2 - Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited to promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD-perturbations of core features-may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and nonsexist usage of identity and gendered terms. On these hard cases, models trained on CAD, especially construct-driven CAD, show higher false positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD-construct-driven and construct-agnostic-reduces such unintended bias.

AB - Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited to promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD-perturbations of core features-may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and nonsexist usage of identity and gendered terms. On these hard cases, models trained on CAD, especially construct-driven CAD, show higher false positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD-construct-driven and construct-agnostic-reduces such unintended bias.

UR - http://www.scopus.com/inward/record.url?scp=85138414469&partnerID=8YFLogxK

U2 - 10.18653/v1/2022.naacl-main.347

DO - 10.18653/v1/2022.naacl-main.347

M3 - Article in proceedings

AN - SCOPUS:85138414469

SP - 4716

EP - 4726

BT - Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

PB - Association for Computational Linguistics (ACL)

Y2 - 10 July 2022 through 15 July 2022

ER -

ID: 341054609

Department of Computer Science