People Make Better Edits - Research staff

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 498 KB, PDF document

Indira Sen
Dennis Assenmacher
Mattia Samory
Augenstein, Isabelle
Wil Aalst
Claudia Wagner

NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

Original language	English
Title of host publication	Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Publisher	Association for Computational Linguistics (ACL)
Publication date	2023
Pages	10480-10504
ISBN (Electronic)	979-8-89176-060-8
DOIs	https://doi.org/10.18653/v1/2023.emnlp-main.649
Publication status	Published - 2023
Event	2023 Conference on Empirical Methods in Natural Language Processing - Singapore Duration: 6 Dec 2023 → 10 Dec 2023

Conference

Conference	2023 Conference on Empirical Methods in Natural Language Processing
By	Singapore
Periode	06/12/2023 → 10/12/2023

ID: 381511490

Department of Computer Science

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Documents

Conference