PhD defence by Lasse Borgholt
Title
Semantic Speech Processing with Neural Networks
Summary
Human spoken language understanding relies heavily on contextual information. The context of a single word provides important clues for the listener to accurately recognize and understand it. If a word is mispronounced or drowned out by noise, the listener may infer the word from context. Words like park and play have different meanings depending on the context they appear in. And even when a word is completely unknown to the listener, context may help in deriving its meaning. Thus, training models to identify semantic relations from context is an important path towards computers that can mimic the human understanding of spoken language. This idea has a long tradition in neural representation learning.
Here, the goal is to learn data representations that are useful for other machine learning tasks. For example, in text-based natural language processing, the idea has inspired approaches for learning semantic word embeddings, such as word2vec. And more recently, it has inspired the development of masked language models, such as BERT. These approaches have revolutionized natural language processing. During the course of this thesis project, speech processing has undergone a similar development. However, these models are still evolving and there is much we do not know about what they learn, why they work, and how we can improve them. This thesis investigates machine learning models that learn semantic features directly from speech.
Assessment Committee
Associate Professor, Isabelle Augenstein, Department of Computer Science, University of Copenhagen.
Associate Professor, Karen Livescu, Toyota Technological Institute at Chicago.
Associate Professor, Florian Metze, Carnegie Mellon University.
Moderator for the defence will be
Associate Professor, Isabelle Augenstein, Department of Computer Science, University of Copenhagen.
For a digital copy of the thesis, please visit https://di.ku.dk/english/research/phd/.