Cross-Cultural Transfer Learning for Chinese Offensive Language Detection

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 738 KB, PDF-dokument

Detecting offensive language is a challenging task. Generalizing across different cultures and languages becomes even more challenging: besides lexical, syntactic and semantic differences, pragmatic aspects such as cultural norms and sensitivities, which are particularly relevant in this context, vary greatly. In this paper, we target Chinese offensive language detection and aim to investigate the impact of transfer learning using offensive language detection data from different cultural backgrounds, specifically Korean and English. We find that culture-specific biases in what is considered offensive negatively impact the transferability of language models (LMs) and that LMs trained on diverse cultural data are sensitive to different features in Chinese offensive language detection. In a few-shot learning scenario, however, our study shows promising prospects for non-English offensive language detection with limited resources. Our findings highlight the importance of cross-cultural transfer learning in improving offensive language detection and promoting inclusive digital spaces.

OriginalsprogEngelsk
TitelEACL 2023 - Cross-Cultural Considerations in NLP @ EACL, Proceedings of the Workshop
ForlagAssociation for Computational Linguistics (ACL)
Publikationsdato2023
Sider8-15
ISBN (Elektronisk)9781959429517
DOI
StatusUdgivet - 2023
Begivenhed1st Workshop on Cross-Cultural Considerations in NLP, C3NLP 2023 - Dubrovnik, Kroatien
Varighed: 5 maj 2023 → …

Konference

Konference1st Workshop on Cross-Cultural Considerations in NLP, C3NLP 2023
LandKroatien
ByDubrovnik
Periode05/05/2023 → …

Bibliografisk note

Funding Information:
Thanks to the anonymous reviewers for their helpful feedback. The authors gratefully acknowledge financial support from China Scholarship Council. (CSC No. 202206070002 and No. 202206160052).

Publisher Copyright:
© 2023 Association for Computational Linguistics.

ID: 372613556