Finding Structural Knowledge in Multimodal-BERT
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Finding Structural Knowledge in Multimodal-BERT. / Milewski, Victor; de Lhoneux, Miryam; Moens, Marie-Francine.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022. p. 5658–5671.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Finding Structural Knowledge in Multimodal-BERT
AU - Milewski, Victor
AU - de Lhoneux, Miryam
AU - Moens, Marie-Francine
PY - 2022
Y1 - 2022
N2 - In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.
AB - In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.
U2 - 10.18653/v1/2022.acl-long.388
DO - 10.18653/v1/2022.acl-long.388
M3 - Article in proceedings
SP - 5658
EP - 5671
BT - Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
PB - Association for Computational Linguistics
T2 - 60th Annual Meeting of the Association for Computational Linguistics
Y2 - 23 May 2022 through 25 May 2022
ER -
ID: 323621674