Efficient, Adaptable and Interpretable NLP

Department of Computer Science

Efficient, Adaptable and Interpretable NLP

Research output: Book/Report › Ph.D. thesis › Research

Standard

Efficient, Adaptable and Interpretable NLP. / Rethmeier, Nils.

Department of Computer Science, Faculty of Science, University of Copenhagen, 2023. 195 p.

Research output: Book/Report › Ph.D. thesis › Research

Harvard

Rethmeier, N 2023, Efficient, Adaptable and Interpretable NLP. Department of Computer Science, Faculty of Science, University of Copenhagen.

APA

Rethmeier, N. (2023). Efficient, Adaptable and Interpretable NLP. Department of Computer Science, Faculty of Science, University of Copenhagen.

Vancouver

Rethmeier N. Efficient, Adaptable and Interpretable NLP. Department of Computer Science, Faculty of Science, University of Copenhagen, 2023. 195 p.

Author

Rethmeier, Nils. / Efficient, Adaptable and Interpretable NLP. Department of Computer Science, Faculty of Science, University of Copenhagen, 2023. 195 p.

Bibtex

@phdthesis{f82ef28f7189426585db1c7a9d975be8,

title = "Efficient, Adaptable and Interpretable NLP",

abstract = "In natural language processing (NLP), a central concern is how to develop and evaluate language model pretraining that better transfers and adapts to downstream tasks. Due to their black box character, it is hard to understand how models transfers knowledge and adapt it during pretraining and downstream application. Ultimately, the goal of language model pretraining is to develop methods that improve transfer and adaption to open-ended downstream tasks, while using training data, compute and model parameters as efficiently as possible. This thesis presents my research for the goal of “developing efficient, adaptable, and interpretable NLP representations{"}, which expands upon existing methodology for language model pretraining and evaluation along three dimensions. (I) Improve our understanding of adaptation at the representation level by contributing a transfer and adaptation interpretability method in two works. The first work proposes a method to quantify knowledge change during pretraining, zero-shot application and fine-tuning. A second work applies this method to in-hospital patient outcome prediction to identify knowledge redundancies, unused data sources, and quantify the impact of individual model components. (II) Contribute best practices and new methods for contrastive learning of language models and NLP representations. A third work surveys self-supervised and supervised contrastive methods in NLP to identify important theoretical aspects like energybased models (EBM) and properties of contrastive learning objectives to inform representation learning design in NLP. A forth work uses these insights to propose a state-of-the-art citation prediction language model that introduces an efficient contrastive citation neighborhood based pretraining method. (III) Make self-supervised pretraining more data-efficient and supervised adaptation more label-efficient by proposing a contrastive and a non-contrastive pretraining method. The fifth work proposes a contrastive language model that unifies selfsupervised pretraining and supervised fine-tuning. This enables data and compute efficient pretraining of a contrastive language model from small data to reduce costs, while markedly improving zero-shot, few-shot and long-tail performance compared to large pretrained language models. The sixth and final work proposes a retrofitting method for word-embeddings in a self-supervised manner to allow data-efficient zero-shot adaptation of representations for classification, analogy and similarity tasks without using any target data.",

author = "Nils Rethmeier",

year = "2023",

language = "English",

publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Efficient, Adaptable and Interpretable NLP

AU - Rethmeier, Nils

PY - 2023

Y1 - 2023

N2 - In natural language processing (NLP), a central concern is how to develop and evaluate language model pretraining that better transfers and adapts to downstream tasks. Due to their black box character, it is hard to understand how models transfers knowledge and adapt it during pretraining and downstream application. Ultimately, the goal of language model pretraining is to develop methods that improve transfer and adaption to open-ended downstream tasks, while using training data, compute and model parameters as efficiently as possible. This thesis presents my research for the goal of “developing efficient, adaptable, and interpretable NLP representations", which expands upon existing methodology for language model pretraining and evaluation along three dimensions. (I) Improve our understanding of adaptation at the representation level by contributing a transfer and adaptation interpretability method in two works. The first work proposes a method to quantify knowledge change during pretraining, zero-shot application and fine-tuning. A second work applies this method to in-hospital patient outcome prediction to identify knowledge redundancies, unused data sources, and quantify the impact of individual model components. (II) Contribute best practices and new methods for contrastive learning of language models and NLP representations. A third work surveys self-supervised and supervised contrastive methods in NLP to identify important theoretical aspects like energybased models (EBM) and properties of contrastive learning objectives to inform representation learning design in NLP. A forth work uses these insights to propose a state-of-the-art citation prediction language model that introduces an efficient contrastive citation neighborhood based pretraining method. (III) Make self-supervised pretraining more data-efficient and supervised adaptation more label-efficient by proposing a contrastive and a non-contrastive pretraining method. The fifth work proposes a contrastive language model that unifies selfsupervised pretraining and supervised fine-tuning. This enables data and compute efficient pretraining of a contrastive language model from small data to reduce costs, while markedly improving zero-shot, few-shot and long-tail performance compared to large pretrained language models. The sixth and final work proposes a retrofitting method for word-embeddings in a self-supervised manner to allow data-efficient zero-shot adaptation of representations for classification, analogy and similarity tasks without using any target data.

AB - In natural language processing (NLP), a central concern is how to develop and evaluate language model pretraining that better transfers and adapts to downstream tasks. Due to their black box character, it is hard to understand how models transfers knowledge and adapt it during pretraining and downstream application. Ultimately, the goal of language model pretraining is to develop methods that improve transfer and adaption to open-ended downstream tasks, while using training data, compute and model parameters as efficiently as possible. This thesis presents my research for the goal of “developing efficient, adaptable, and interpretable NLP representations", which expands upon existing methodology for language model pretraining and evaluation along three dimensions. (I) Improve our understanding of adaptation at the representation level by contributing a transfer and adaptation interpretability method in two works. The first work proposes a method to quantify knowledge change during pretraining, zero-shot application and fine-tuning. A second work applies this method to in-hospital patient outcome prediction to identify knowledge redundancies, unused data sources, and quantify the impact of individual model components. (II) Contribute best practices and new methods for contrastive learning of language models and NLP representations. A third work surveys self-supervised and supervised contrastive methods in NLP to identify important theoretical aspects like energybased models (EBM) and properties of contrastive learning objectives to inform representation learning design in NLP. A forth work uses these insights to propose a state-of-the-art citation prediction language model that introduces an efficient contrastive citation neighborhood based pretraining method. (III) Make self-supervised pretraining more data-efficient and supervised adaptation more label-efficient by proposing a contrastive and a non-contrastive pretraining method. The fifth work proposes a contrastive language model that unifies selfsupervised pretraining and supervised fine-tuning. This enables data and compute efficient pretraining of a contrastive language model from small data to reduce costs, while markedly improving zero-shot, few-shot and long-tail performance compared to large pretrained language models. The sixth and final work proposes a retrofitting method for word-embeddings in a self-supervised manner to allow data-efficient zero-shot adaptation of representations for classification, analogy and similarity tasks without using any target data.

M3 - Ph.D. thesis

BT - Efficient, Adaptable and Interpretable NLP

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

ID: 381157889