Compositional Generalization in Image Captioning
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Compositional Generalization in Image Captioning. / Nikolaus, Mitja; Abdou, Mostafa; Lamm, Matthew; Aralikatte, Rahul; Elliott, Desmond.
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, 2019. p. 87-98.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Compositional Generalization in Image Captioning
AU - Nikolaus, Mitja
AU - Abdou, Mostafa
AU - Lamm, Matthew
AU - Aralikatte, Rahul
AU - Elliott, Desmond
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.
AB - Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.
U2 - 10.18653/v1/K19-1009
DO - 10.18653/v1/K19-1009
M3 - Article in proceedings
SP - 87
EP - 98
BT - Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
PB - Association for Computational Linguistics
T2 - 23rd Conference on Computational Natural Language Learning
Y2 - 3 November 2019 through 4 November 2019
ER -
ID: 230849989