On the Limitations of Unsupervised Bilingual Dictionary Induction

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Søgaard, Anders
Sebastian Ruder
Ivan Vulic

Unsupervised machine translation—i.e.,not assuming any cross-lingual supervisionsignal, whether a dictionary, translations,or comparable corpora—seems impossible,but nevertheless, Lample et al.(2018a) recently proposed a fully unsupervisedmachine translation (MT) model.The model relies heavily on an adversarial,unsupervised alignment of word embeddingspaces for bilingual dictionary induction(Conneau et al., 2018), which weexamine here. Our results identify the limitationsof current unsupervised MT: unsupervisedbilingual dictionary inductionperforms much worse on morphologicallyrich languages that are not dependent marking,when monolingual corpora from differentdomains or different embedding algorithmsare used. We show that a simpletrick, exploiting a weak supervision signalfrom identical words, enables more robustinduction, and establish a near-perfectcorrelation between unsupervised bilingualdictionary induction performance and a previouslyunexplored graph similarity metric

Originalsprog	Engelsk
Titel	Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics : (Long papers)
Forlag	Association for Computational Linguistics
Publikationsdato	2018
Sider	778–788
Status	Udgivet - 2018
Begivenhed	56th Annual Meeting of the Association for Computational Linguistics - System Demonstrations - Melbourne, Australien Varighed: 15 jul. 2018 → 20 jul. 2018

Konference

Konference	56th Annual Meeting of the Association for Computational Linguistics - System Demonstrations
Land	Australien
By	Melbourne
Periode	15/07/2018 → 20/07/2018

ID: 214756841

Datalogisk Institut

On the Limitations of Unsupervised Bilingual Dictionary Induction

Konference