Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Documents
- Fulltext
Final published version, 2.49 MB, PDF document
In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly. Nonetheless, there are open challenges since the development and deployment of large models comes with a need for high computational resources and has economical consequences. In this work, we follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment. We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text compared to the available alternatives (XLM-R). We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size. Lastly, we examine the impact of a full-scale pipeline for model compression which includes: a) Parameter Pruning, b) Knowledge Distillation, and c) Quantization: The resulting models are much more efficient without sacrificing performance at large.
Original language | English |
---|---|
Title of host publication | NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop |
Number of pages | 23 |
Publisher | Association for Computational Linguistics (ACL) |
Publication date | 2022 |
Pages | 88-110 |
ISBN (Electronic) | 9781959429180 |
Publication status | Published - 2022 |
Event | 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 8 Dec 2022 → … |
Conference
Conference | 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
---|---|
Land | United Arab Emirates |
By | Abu Dhabi |
Periode | 08/12/2022 → … |
Sponsor | Bloomberg, European Research Council (ERC), LBox |
Bibliographical note
Publisher Copyright:
© 2022 Association for Computational Linguistics.
Links
- https://aclanthology.org/2022.nllp-1.8
Final published version
ID: 358726422