DIKU Bits: Retrieval-augmented Image Captioning



Rita Ramos, Natural Language Processing (NLP) section


Inspired by retrieval-augmented language generation, I will present two approaches to image captioning that generates sentences conditioned on a set of captions retrieved from a datastore, as opposed to the image alone. First, our SmallCap model shows the potential of retrieval augmentation to improve performance and reduce training costs in image captioning. I will then present LMCap, a multilingual captioning model that uses retrieval-augmentation in a training-free manner.

Which courses do you teach? (BSc and MSc)

I am currently not teaching any courses as I am in the process of completing my PhD, expected to be finished by the end of this year.

Which technology/research/projects/startup are you excited to see the evolution of?

I'm particularly excited about initiatives on language diversity on NLP that bridge the gap between high-resource and low-resource languages. I'm keeping a close watch on initiatives like Masakhane, SEACrowd, and Aya that represent a significant step towards this goal.