Understanding the Effect of Textual Adversaries in Multimodal Machine Translation

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image. This assumption has been challenged in experiments demonstrating that state-of-the-art multimodal systems perform equally well in the presence of randomly selected images, but, more recently, it has been shown that masking entities from the source language sentence during training can help to overcome this problem. In this paper, we conduct experiments with both visual and textual adversaries in order to understand the role of incorrect textual inputs to such systems. Our results show that when the source language sentence contains mistakes, multimodal translation systems do not leverage the additional visual signal to produce the correct translation. We also find that the degradation of translation performance caused by textual adversaries is significantly higher than by visual adversaries.
Original languageEnglish
Title of host publicationProceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Number of pages6
Place of PublicationHong Kong, China
PublisherAssociation for Computational Linguistics
Publication date1 Nov 2019
Publication statusPublished - 1 Nov 2019
EventFirst Workshop Beyond Vision and LANguage:
inTEgrating Real-world kNowledge
- Hong Kong, Hong Kong
Duration: 3 Nov 2019 → …


WorkshopFirst Workshop Beyond Vision and LANguage:
inTEgrating Real-world kNowledge
LocationHong Kong
ByHong Kong
Periode03/11/2019 → …

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 230850047