Finding Structural Knowledge in Multimodal-BERT

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Finding Structural Knowledge in Multimodal-BERT. / Milewski, Victor; de Lhoneux, Miryam; Moens, Marie-Francine.

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022. p. 5658–5671.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Milewski, V, de Lhoneux, M & Moens, M-F 2022, Finding Structural Knowledge in Multimodal-BERT. in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 5658–5671, 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 23/05/2022. https://doi.org/10.18653/v1/2022.acl-long.388

APA

Milewski, V., de Lhoneux, M., & Moens, M-F. (2022). Finding Structural Knowledge in Multimodal-BERT. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5658–5671). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.388

Vancouver

Milewski V, de Lhoneux M, Moens M-F. Finding Structural Knowledge in Multimodal-BERT. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. 2022. p. 5658–5671 https://doi.org/10.18653/v1/2022.acl-long.388

Author

Milewski, Victor ; de Lhoneux, Miryam ; Moens, Marie-Francine. / Finding Structural Knowledge in Multimodal-BERT. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022. pp. 5658–5671

Bibtex

@inproceedings{859bdf79bc5046c08a2a0534bb891e85,

title = "Finding Structural Knowledge in Multimodal-BERT",

abstract = "In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.",

author = "Victor Milewski and {de Lhoneux}, Miryam and Marie-Francine Moens",

year = "2022",

doi = "10.18653/v1/2022.acl-long.388",

language = "English",

pages = "5658–5671",

booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",

publisher = "Association for Computational Linguistics",

note = " 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; Conference date: 23-05-2022 Through 25-05-2022",

}

RIS

TY - GEN

T1 - Finding Structural Knowledge in Multimodal-BERT

AU - Milewski, Victor

AU - de Lhoneux, Miryam

AU - Moens, Marie-Francine

PY - 2022

Y1 - 2022

N2 - In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.

AB - In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.

U2 - 10.18653/v1/2022.acl-long.388

DO - 10.18653/v1/2022.acl-long.388

M3 - Article in proceedings

SP - 5658

EP - 5671

BT - Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

PB - Association for Computational Linguistics

T2 - 60th Annual Meeting of the Association for Computational Linguistics

Y2 - 23 May 2022 through 25 May 2022

ER -

ID: 323621674

Department of Computer Science