Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Legal-Tech Open Diaries : Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. / Maroudas, Stelios; Legkas, Sotiris; Malakasiotis, Prodromos; Chalkidis, Ilias.

NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), 2022. p. 88-110.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Maroudas, S, Legkas, S, Malakasiotis, P & Chalkidis, I 2022, Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. in NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), pp. 88-110, 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 08/12/2022. <https://aclanthology.org/2022.nllp-1.8>

APA

Maroudas, S., Legkas, S., Malakasiotis, P., & Chalkidis, I. (2022). Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. In NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop (pp. 88-110). Association for Computational Linguistics (ACL). https://aclanthology.org/2022.nllp-1.8

Vancouver

Maroudas S, Legkas S, Malakasiotis P, Chalkidis I. Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. In NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL). 2022. p. 88-110

Author

Maroudas, Stelios ; Legkas, Sotiris ; Malakasiotis, Prodromos ; Chalkidis, Ilias. / Legal-Tech Open Diaries : Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), 2022. pp. 88-110

Bibtex

@inproceedings{70c4152d2c57402d8f98907183d5cd77,
title = "Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models",
abstract = "In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly. Nonetheless, there are open challenges since the development and deployment of large models comes with a need for high computational resources and has economical consequences. In this work, we follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment. We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text compared to the available alternatives (XLM-R). We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size. Lastly, we examine the impact of a full-scale pipeline for model compression which includes: a) Parameter Pruning, b) Knowledge Distillation, and c) Quantization: The resulting models are much more efficient without sacrificing performance at large.",
author = "Stelios Maroudas and Sotiris Legkas and Prodromos Malakasiotis and Ilias Chalkidis",
note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; Conference date: 08-12-2022",
year = "2022",
language = "English",
pages = "88--110",
booktitle = "NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",

}

RIS

TY - GEN

T1 - Legal-Tech Open Diaries

T2 - 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

AU - Maroudas, Stelios

AU - Legkas, Sotiris

AU - Malakasiotis, Prodromos

AU - Chalkidis, Ilias

N1 - Publisher Copyright: © 2022 Association for Computational Linguistics.

PY - 2022

Y1 - 2022

N2 - In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly. Nonetheless, there are open challenges since the development and deployment of large models comes with a need for high computational resources and has economical consequences. In this work, we follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment. We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text compared to the available alternatives (XLM-R). We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size. Lastly, we examine the impact of a full-scale pipeline for model compression which includes: a) Parameter Pruning, b) Knowledge Distillation, and c) Quantization: The resulting models are much more efficient without sacrificing performance at large.

AB - In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly. Nonetheless, there are open challenges since the development and deployment of large models comes with a need for high computational resources and has economical consequences. In this work, we follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment. We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text compared to the available alternatives (XLM-R). We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size. Lastly, we examine the impact of a full-scale pipeline for model compression which includes: a) Parameter Pruning, b) Knowledge Distillation, and c) Quantization: The resulting models are much more efficient without sacrificing performance at large.

UR - http://www.scopus.com/inward/record.url?scp=85154595338&partnerID=8YFLogxK

M3 - Article in proceedings

AN - SCOPUS:85154595338

SP - 88

EP - 110

BT - NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop

PB - Association for Computational Linguistics (ACL)

Y2 - 8 December 2022

ER -

ID: 358726422