Naive regularizers for low-resource neural machine translation

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Meriem Beloucif
Gonzalez, Ana Valeria
Marcel Bollmann
Søgaard, Anders

Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.

Original language	English
Title of host publication	International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
Editors	Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
Number of pages	10
Publisher	Incoma Ltd
Publication date	2019
Pages	102-111
ISBN (Electronic)	9789544520557
DOIs	https://doi.org/10.26615/978-954-452-056-4_013
Publication status	Published - 2019
Event	12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgaria Duration: 2 Sep 2019 → 4 Sep 2019

Conference

Conference	12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Land	Bulgaria
By	Varna
Periode	02/09/2019 → 04/09/2019

ID: 237806742

Department of Computer Science

Naive regularizers for low-resource neural machine translation

Conference