Multilingual and Multimodal Language Processing for All Languages


My research focuses on the design and evaluation of multimodal machine learning models that process inputs from different input modalities, such as images, text, and speech. I have studied these models from the perspective of general-purpose representation learning, and in downstream tasks like automatic image captioning or cross-modal retrieval. Most prior work in this area has focused on the English language, due to the limited availability of datasets. In contrast, my research has focused on the importance of multilingual processing, through the collection of non-English datasets in a culturally-engaged process. I will present approaches to collecting and modelling culturally-relevant multimodal data. A second strand of my research focuses on tokenization-free language modelling using visual text representations, directly connecting my interests in multimodal and multilingual learning. I will also present my recent work on creating language models that can process text in any written language. I will conclude by presenting an overview of my teaching and service contributions to the department, and my future plans for research and teaching.