PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

PAELLA : Parameter-Efficient Lightweight Language-Agnostic Captioning Model. / Ramos, Rita; Bugliarello, Emanuele; Martins, Bruno; Elliott, Desmond.

Findings of the Association for Computational Linguistics: NAACL 2024. ed. / Kevin Duh; Helena Gomez; Steven Bethard. Association for Computational Linguistics (ACL), 2024. p. 3549-3564.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Ramos, R, Bugliarello, E, Martins, B & Elliott, D 2024, PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model. in K Duh, H Gomez & S Bethard (eds), Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics (ACL), pp. 3549-3564, 2024 Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16/06/2024. <https://aclanthology.org/2024.findings-naacl.225/>

APA

Ramos, R., Bugliarello, E., Martins, B., & Elliott, D. (2024). PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model. In K. Duh, H. Gomez, & S. Bethard (Eds.), Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 3549-3564). Association for Computational Linguistics (ACL). https://aclanthology.org/2024.findings-naacl.225/

Vancouver

Ramos R, Bugliarello E, Martins B, Elliott D. PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model. In Duh K, Gomez H, Bethard S, editors, Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics (ACL). 2024. p. 3549-3564

Author

Ramos, Rita ; Bugliarello, Emanuele ; Martins, Bruno ; Elliott, Desmond. / PAELLA : Parameter-Efficient Lightweight Language-Agnostic Captioning Model. Findings of the Association for Computational Linguistics: NAACL 2024. editor / Kevin Duh ; Helena Gomez ; Steven Bethard. Association for Computational Linguistics (ACL), 2024. pp. 3549-3564

Bibtex

@inproceedings{5eb92126395b4cab9fd7005370723441,
title = "PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model",
abstract = "We introduce PAELLA, a Parameter-Efficient Lightweight Language-Agnostic image captioning model designed to be both parameter and data-efficient using retrieval augmentation. The model is trained by learning a small mapping network with 34M parameters between a pre-trained visual model and a multilingual language model that is conditioned on two types of input: (i) the image itself, and (ii) a set of retrieved captions in the target language. The retrieved examples play a key role in guiding the model to generate captions across languages. Through retrieval, the model can be lightweight in terms of the number of trainable parameters, which only exist in its mapping network, and also in the amount of multilingual training data that is required. Experiments on the XM3600 dataset, featuring 36 languages, show that PAELLA can outperform or compete against some models with 3-77× more learned parameters and 35-863× more data, particularly in low-resource languages. We also find that PAELLA can be trained on only monolingual data and still show strong zero-shot abilities in other languages.",
author = "Rita Ramos and Emanuele Bugliarello and Bruno Martins and Desmond Elliott",
note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 2024 Findings of the Association for Computational Linguistics: NAACL 2024 ; Conference date: 16-06-2024 Through 21-06-2024",
year = "2024",
language = "English",
pages = "3549--3564",
editor = "Kevin Duh and Helena Gomez and Steven Bethard",
booktitle = "Findings of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",

}

RIS

TY - GEN

T1 - PAELLA

T2 - 2024 Findings of the Association for Computational Linguistics: NAACL 2024

AU - Ramos, Rita

AU - Bugliarello, Emanuele

AU - Martins, Bruno

AU - Elliott, Desmond

N1 - Publisher Copyright: © 2024 Association for Computational Linguistics.

PY - 2024

Y1 - 2024

N2 - We introduce PAELLA, a Parameter-Efficient Lightweight Language-Agnostic image captioning model designed to be both parameter and data-efficient using retrieval augmentation. The model is trained by learning a small mapping network with 34M parameters between a pre-trained visual model and a multilingual language model that is conditioned on two types of input: (i) the image itself, and (ii) a set of retrieved captions in the target language. The retrieved examples play a key role in guiding the model to generate captions across languages. Through retrieval, the model can be lightweight in terms of the number of trainable parameters, which only exist in its mapping network, and also in the amount of multilingual training data that is required. Experiments on the XM3600 dataset, featuring 36 languages, show that PAELLA can outperform or compete against some models with 3-77× more learned parameters and 35-863× more data, particularly in low-resource languages. We also find that PAELLA can be trained on only monolingual data and still show strong zero-shot abilities in other languages.

AB - We introduce PAELLA, a Parameter-Efficient Lightweight Language-Agnostic image captioning model designed to be both parameter and data-efficient using retrieval augmentation. The model is trained by learning a small mapping network with 34M parameters between a pre-trained visual model and a multilingual language model that is conditioned on two types of input: (i) the image itself, and (ii) a set of retrieved captions in the target language. The retrieved examples play a key role in guiding the model to generate captions across languages. Through retrieval, the model can be lightweight in terms of the number of trainable parameters, which only exist in its mapping network, and also in the amount of multilingual training data that is required. Experiments on the XM3600 dataset, featuring 36 languages, show that PAELLA can outperform or compete against some models with 3-77× more learned parameters and 35-863× more data, particularly in low-resource languages. We also find that PAELLA can be trained on only monolingual data and still show strong zero-shot abilities in other languages.

M3 - Article in proceedings

AN - SCOPUS:85197894973

SP - 3549

EP - 3564

BT - Findings of the Association for Computational Linguistics

A2 - Duh, Kevin

A2 - Gomez, Helena

A2 - Bethard, Steven

PB - Association for Computational Linguistics (ACL)

Y2 - 16 June 2024 through 21 June 2024

ER -

ID: 398633242