PhD defence by Rasmus Kær Jørgensen

Portrait of Rasmus

Title

Multilingual Natural Language Processing for Applications in the Financial Domain

Abstract

Financial data is vital for every company and is constantly generated worldwide in high volumes and different varieties. Financial text occurs in multiple languages when analyzing stock market information, processing tax and accounting data, or processing invoices, transactions, and other financial tasks. Financial natural language processing is an emerging area of natural language processing (NLP), which is getting more and more important in the financial domain as it needs automatic systems that accurately process domain-specific data in multiple languages.

This thesis presents research that investigates multilingual NLP for applications in the financial domain. First, learning systems are devised to automate the accounting task of mapping transactions to accounts. Accounting firms need a system that autonomously learns to handle these transactions accurately, even from limited training data. Ideally, there would be a system that learns to classify transactions across companies and corporate sectors, and that is able to generalize to new companies for which little or even no historical data exist – and this thesis demonstrates that this is indeed possible.

The second line of research advances multilingual NLP in the financial domain by extending domain-adaptive pretraining to a multilingual scenario, focusing on adapting a single model to multiple languages within a specific domain. Several domain-specific resources for model evaluation are proposed, including a financial benchmark covering multiple languages for evaluating multilingual financial language models.

The third part evaluates the explanations produced by interpretation methods for multilingual NLP systems. As NLP systems are deployed, and users interact with these systems, it is important to understand the performance of the employed methods and to which extent they might align with humans. We study this alignment and provide a multilingual parallel corpus of human rationale annotations to benchmark models and interpretability methods in a multilingual setting.

Assessment Committee

Professor Isabelle Augenstein, DIKU
Associate Professor Dirk Hovy, Bocconi University, Milan, Italy
Associate Professor Raghava Mukkamala, Copenhagen Business School, Denmark

Moderator of defence: Daniel Hershcovich

Supervisors

Principal Supervisor Christian Igel
Co-Supervisor Lars Silberg Hansen
Co-Supervisor Peter Sergio Larsen

For an electronic copy of the thesis, please visit the PhD Programme page