Adversarial Removal of Demographic Attributes Revisited

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Adversarial Removal of Demographic Attributes Revisited. / Barrett, Maria; Kementchedjhieva, Yova; Elazar, Yanai; Elliott, Desmond; Søgaard, Anders.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. p. 6329-6334.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Barrett, M, Kementchedjhieva, Y, Elazar, Y, Elliott, D & Søgaard, A 2019, Adversarial Removal of Demographic Attributes Revisited. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp. 6329-6334, 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 03/11/2019. https://doi.org/10.18653/v1/D19-1662

APA

Barrett, M., Kementchedjhieva, Y., Elazar, Y., Elliott, D., & Søgaard, A. (2019). Adversarial Removal of Demographic Attributes Revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 6329-6334). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1662

Vancouver

Barrett M, Kementchedjhieva Y, Elazar Y, Elliott D, Søgaard A. Adversarial Removal of Demographic Attributes Revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics. 2019. p. 6329-6334 https://doi.org/10.18653/v1/D19-1662

Author

Barrett, Maria ; Kementchedjhieva, Yova ; Elazar, Yanai ; Elliott, Desmond ; Søgaard, Anders. / Adversarial Removal of Demographic Attributes Revisited. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. pp. 6329-6334

Bibtex

@inproceedings{5cdb0c5bd11840d1a526082e35129172,
title = "Adversarial Removal of Demographic Attributes Revisited",
abstract = "Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.",
author = "Maria Barrett and Yova Kementchedjhieva and Yanai Elazar and Desmond Elliott and Anders S{\o}gaard",
year = "2019",
month = nov,
day = "1",
doi = "10.18653/v1/D19-1662",
language = "English",
pages = "6329--6334",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
publisher = "Association for Computational Linguistics",
note = "2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) ; Conference date: 03-11-2019 Through 07-11-2019",

}

RIS

TY - GEN

T1 - Adversarial Removal of Demographic Attributes Revisited

AU - Barrett, Maria

AU - Kementchedjhieva, Yova

AU - Elazar, Yanai

AU - Elliott, Desmond

AU - Søgaard, Anders

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

AB - Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

U2 - 10.18653/v1/D19-1662

DO - 10.18653/v1/D19-1662

M3 - Article in proceedings

SP - 6329

EP - 6334

BT - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics

T2 - 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Y2 - 3 November 2019 through 7 November 2019

ER -

ID: 230849927