People Make Better Edits

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

People Make Better Edits : Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection. / Sen, Indira; Assenmacher, Dennis; Samory, Mattia; Augenstein, Isabelle; Aalst, Wil; Wagner, Claudia.

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023. p. 10480-10504.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Sen, I, Assenmacher, D, Samory, M, Augenstein, I, Aalst, W & Wagner, C 2023, People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection. in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), pp. 10480-10504, 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 06/12/2023. https://doi.org/10.18653/v1/2023.emnlp-main.649

APA

Sen, I., Assenmacher, D., Samory, M., Augenstein, I., Aalst, W., & Wagner, C. (2023). People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10480-10504). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.649

Vancouver

Sen I, Assenmacher D, Samory M, Augenstein I, Aalst W, Wagner C. People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2023. p. 10480-10504 https://doi.org/10.18653/v1/2023.emnlp-main.649

Author

Sen, Indira ; Assenmacher, Dennis ; Samory, Mattia ; Augenstein, Isabelle ; Aalst, Wil ; Wagner, Claudia. / People Make Better Edits : Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023. pp. 10480-10504

Bibtex

@inproceedings{3ef302bed41e4677b24ccb2747cd8944,

title = "People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection",

abstract = "NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.",

author = "Indira Sen and Dennis Assenmacher and Mattia Samory and Isabelle Augenstein and Wil Aalst and Claudia Wagner",

year = "2023",

doi = "10.18653/v1/2023.emnlp-main.649",

language = "English",

pages = "10480--10504",

booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

note = "2023 Conference on Empirical Methods in Natural Language Processing ; Conference date: 06-12-2023 Through 10-12-2023",

}

RIS

TY - GEN

T1 - People Make Better Edits

T2 - 2023 Conference on Empirical Methods in Natural Language Processing

AU - Sen, Indira

AU - Assenmacher, Dennis

AU - Samory, Mattia

AU - Augenstein, Isabelle

AU - Aalst, Wil

AU - Wagner, Claudia

PY - 2023

Y1 - 2023

N2 - NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

AB - NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

U2 - 10.18653/v1/2023.emnlp-main.649

DO - 10.18653/v1/2023.emnlp-main.649

M3 - Article in proceedings

SP - 10480

EP - 10504

BT - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics (ACL)

Y2 - 6 December 2023 through 10 December 2023

ER -

ID: 381511490

Department of Computer Science