Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

  • Stratos Xenouleas
  • Alexia Tsoukara
  • Giannis Panagiotakis
  • Chalkidis, Ilias
  • Ion Androutsopoulos

We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

TitelProceedings of the 12th Hellenic Conference on Artificial Intelligence, SETN 2022
Antal sider8
ForlagAssociation for Computing Machinery, Inc.
ISBN (Elektronisk)9781450395977
StatusUdgivet - 2022
Begivenhed12th Hellenic Conference on Artificial Intelligence, SETN 2022 - Corfu, Grækenland
Varighed: 7 sep. 20229 sep. 2022


Konference12th Hellenic Conference on Artificial Intelligence, SETN 2022
SponsorHellenic Artificial Intelligence Society, Humanistic and Social Informatics Laboratory (HILab), Ionian University, Department of Informatics
NavnACM International Conference Proceeding Series

Funding Information:
This work is partly funded by the Innovation Fund Denmark (IFD)6 under File No. 0175-00011A. This research has been also co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE ( 2 -03849).

© 2022 ACM.

