LeXFiles and LegalLAMA

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 294 KB, PDF-dokument

Chalkidis, Ilias
Garneau, Nicolas
Søgaard, Anders
Cătălină Goantă
Daniel Martin Katz

In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.

Originalsprog	Engelsk
Titel	Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Antal sider	23
Forlag	Association for Computational Linguistics (ACL)
Publikationsdato	2023
Sider	15513-15535
ISBN (Elektronisk)	9781959429722
DOI	https://doi.org/10.18653/v1/2023.acl-long.865
Status	Udgivet - 2023
Begivenhed	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada Varighed: 9 jul. 2023 → 14 jul. 2023

Konference

Konference	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Land	Canada
By	Toronto
Periode	09/07/2023 → 14/07/2023
Sponsor	Bloomberg Engineering, et al., Google Research, Liveperson, Meta, Microsoft

Bibliografisk note

Funding Information:
This work was partly funded by the In novation Fund Denmark (IFD, https: //innovationsfonden.dk/en) and the Fonds de recherche du Québec – Nature et technologies (FRQNT, https://frq.gouv.qc.ca/ nature-et-technologies/).

Publisher Copyright:
© 2023 Association for Computational Linguistics.

ID: 372528222

Datalogisk Institut

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Dokumenter

Konference

Bibliografisk note