PhD defence by Nils Rethmeier



Efficient, Adaptable and Interpretable NLP


In natural language processing (NLP), a central concern is how to develop and evaluate language model pretraining that better transfers and adapts to downstream tasks.

Due to their black box character, it is hard to understand how models transfers knowledge and adapt it during pretraining and downstream application.

Ultimately, the goal of language model pretraining is to develop methods that improve transfer and adaption to open-ended downstream tasks, while using training data, compute and model parameters as efficiently as possible.

This thesis presents my research for the goal of ``developing efficient, adaptable, and interpretable NLP representations", which expands upon existing methodology for language model pretraining and evaluation along three dimensions. 

(I) Improve our understanding of adaptation at the representation level by contributing a transfer and adaptation interpretability method in two works. 

The first work proposes a method to quantify knowledge change during pretraining, zero-shot application and fine-tuning.

A second work applies this method to in-hospital patient outcome prediction to identify knowledge redundancies, unused data sources, and quantify the impact of individual model components.

(II) Contribute best practices and new methods for contrastive learning of language models and NLP representations.

A third work surveys self-supervised and supervised contrastive methods in NLP to identify important theoretical aspects like energy-based models (EBM) and properties of contrastive learning objectives to inform representation learning design in NLP.

A forth work uses these insights to propose a state-of-the-art citation prediction language model that introduces an efficient contrastive citation neighborhood based pretraining method.

(III) Make self-supervised pretraining more data-efficient and supervised adaptation more label-efficient by proposing a contrastive and a non-contrastive pretraining method.

The fifth work proposes a contrastive language model that unifies self-supervised pretraining and supervised fine-tuning. 

This enables data and compute efficient pretraining of a contrastive language model from small data to reduce costs, while markedly improving zero-shot, few-shot and long-tail performance compared to large pretrained language models.

The sixth and final work proposes a retrofitting method for word-embeddings in a self-supervised manner to allow data-efficient zero-shot adaptation of representations for classification, analogy and similarity tasks without using any target data.

Assessment Committee

Professor Christina Lioma, Computer Science
Professor Ivan Habernal, Technische Universität Darmstadt and Ludwig-Maximilians-Universität München, Germany
Professor Roman Klinger, University of Stuttgart, Germany

Leader of defense: Assistant Professor Raghavendra Selvan


Principal Supervisor: Professor Isabelle Augenstein

Zoom link

For an electronic copy of the thesis, please visit the PhD Programme page