Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

  • Stratos Xenouleas
  • Alexia Tsoukara
  • Giannis Panagiotakis
  • Chalkidis, Ilias
  • Ion Androutsopoulos

We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

Original languageEnglish
Title of host publicationProceedings of the 12th Hellenic Conference on Artificial Intelligence, SETN 2022
Number of pages8
PublisherAssociation for Computing Machinery, Inc.
Publication date2022
Article number19
ISBN (Electronic)9781450395977
DOIs
Publication statusPublished - 2022
Event12th Hellenic Conference on Artificial Intelligence, SETN 2022 - Corfu, Greece
Duration: 7 Sep 20229 Sep 2022

Conference

Conference12th Hellenic Conference on Artificial Intelligence, SETN 2022
LandGreece
ByCorfu
Periode07/09/202209/09/2022
SponsorHellenic Artificial Intelligence Society, Humanistic and Social Informatics Laboratory (HILab), Ionian University, Department of Informatics
SeriesACM International Conference Proceeding Series

Bibliographical note

Publisher Copyright:
© 2022 ACM.

    Research areas

  • legal text classification, natural language processing, zero-shot cross-lingual transfer learning

ID: 342927381