Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. / Hirasawa, Tosho; Bugliarello, Emanuele; Elliott, Desmond; Komachi, Mamoru.

Proceedings of the 8th Conference on Machine Translation, WMT 2023. Association for Computational Linguistics (ACL), 2023. s. 520-533.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Hirasawa, T, Bugliarello, E, Elliott, D & Komachi, M 2023, Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. i Proceedings of the 8th Conference on Machine Translation, WMT 2023. Association for Computational Linguistics (ACL), s. 520-533, 8th Conference on Machine Translation, WMT 2023, Singapore, Singapore, 06/12/2023. https://doi.org/10.18653/v1/2023.wmt-1.47

APA

Hirasawa, T., Bugliarello, E., Elliott, D., & Komachi, M. (2023). Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. I Proceedings of the 8th Conference on Machine Translation, WMT 2023 (s. 520-533). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.wmt-1.47

Vancouver

Hirasawa T, Bugliarello E, Elliott D, Komachi M. Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. I Proceedings of the 8th Conference on Machine Translation, WMT 2023. Association for Computational Linguistics (ACL). 2023. s. 520-533 https://doi.org/10.18653/v1/2023.wmt-1.47

Author

Hirasawa, Tosho ; Bugliarello, Emanuele ; Elliott, Desmond ; Komachi, Mamoru. / Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. Proceedings of the 8th Conference on Machine Translation, WMT 2023. Association for Computational Linguistics (ACL), 2023. s. 520-533

Bibtex

@inproceedings{f42cb4180c8a462d957af298d02d715f,

title = "Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation",

abstract = "Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory. Our code are available at https://github.com/toshohirasawa/zeroshot-crossmodal-mt.",

author = "Tosho Hirasawa and Emanuele Bugliarello and Desmond Elliott and Mamoru Komachi",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 8th Conference on Machine Translation, WMT 2023 ; Conference date: 06-12-2023 Through 07-12-2023",

year = "2023",

doi = "10.18653/v1/2023.wmt-1.47",

language = "English",

pages = "520--533",

booktitle = "Proceedings of the 8th Conference on Machine Translation, WMT 2023",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation

AU - Hirasawa, Tosho

AU - Bugliarello, Emanuele

AU - Elliott, Desmond

AU - Komachi, Mamoru

PY - 2023

Y1 - 2023

N2 - Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory. Our code are available at https://github.com/toshohirasawa/zeroshot-crossmodal-mt.

AB - Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory. Our code are available at https://github.com/toshohirasawa/zeroshot-crossmodal-mt.

U2 - 10.18653/v1/2023.wmt-1.47

DO - 10.18653/v1/2023.wmt-1.47

M3 - Article in proceedings

AN - SCOPUS:85179138715

SP - 520

EP - 533

BT - Proceedings of the 8th Conference on Machine Translation, WMT 2023

PB - Association for Computational Linguistics (ACL)

T2 - 8th Conference on Machine Translation, WMT 2023

Y2 - 6 December 2023 through 7 December 2023

ER -

ID: 377814940

Datalogisk Institut