Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 2,19 MB, PDF-dokument

Tosho Hirasawa
Bugliarello, Emanuele
Elliott, Desmond
Mamoru Komachi

Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory. Our code are available at https://github.com/toshohirasawa/zeroshot-crossmodal-mt.

Originalsprog	Engelsk
Titel	Proceedings of the 8th Conference on Machine Translation, WMT 2023
Forlag	Association for Computational Linguistics (ACL)
Publikationsdato	2023
Sider	520-533
ISBN (Elektronisk)	9798891760417
DOI	https://doi.org/10.18653/v1/2023.wmt-1.47
Status	Udgivet - 2023
Begivenhed	8th Conference on Machine Translation, WMT 2023 - Singapore, Singapore Varighed: 6 dec. 2023 → 7 dec. 2023

Konference

Konference	8th Conference on Machine Translation, WMT 2023
Land	Singapore
By	Singapore
Periode	06/12/2023 → 07/12/2023

Bibliografisk note

ID: 377814940

Datalogisk Institut