Naive regularizers for low-resource neural machine translation

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Naive regularizers for low-resource neural machine translation. / Beloucif, Meriem; Gonzalez, Ana Valeria; Bollmann, Marcel; Søgaard, Anders.

International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. ed. / Galia Angelova; Ruslan Mitkov; Ivelina Nikolova; Irina Temnikova; Irina Temnikova. Incoma Ltd, 2019. p. 102-111.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Beloucif, M, Gonzalez, AV, Bollmann, M & Søgaard, A 2019, Naive regularizers for low-resource neural machine translation. in G Angelova, R Mitkov, I Nikolova, I Temnikova & I Temnikova (eds), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Incoma Ltd, pp. 102-111, 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 02/09/2019. https://doi.org/10.26615/978-954-452-056-4_013

APA

Beloucif, M., Gonzalez, A. V., Bollmann, M., & Søgaard, A. (2019). Naive regularizers for low-resource neural machine translation. In G. Angelova, R. Mitkov, I. Nikolova, I. Temnikova, & I. Temnikova (Eds.), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings (pp. 102-111). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_013

Vancouver

Beloucif M, Gonzalez AV, Bollmann M, Søgaard A. Naive regularizers for low-resource neural machine translation. In Angelova G, Mitkov R, Nikolova I, Temnikova I, Temnikova I, editors, International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Incoma Ltd. 2019. p. 102-111 https://doi.org/10.26615/978-954-452-056-4_013

Author

Beloucif, Meriem ; Gonzalez, Ana Valeria ; Bollmann, Marcel ; Søgaard, Anders. / Naive regularizers for low-resource neural machine translation. International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. editor / Galia Angelova ; Ruslan Mitkov ; Ivelina Nikolova ; Irina Temnikova ; Irina Temnikova. Incoma Ltd, 2019. pp. 102-111

Bibtex

@inproceedings{3820c76c23b44ab8b672ed494ca870e9,
title = "Naive regularizers for low-resource neural machine translation",
abstract = "Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.",
author = "Meriem Beloucif and Gonzalez, {Ana Valeria} and Marcel Bollmann and Anders S{\o}gaard",
year = "2019",
doi = "10.26615/978-954-452-056-4_013",
language = "English",
pages = "102--111",
editor = "Galia Angelova and Ruslan Mitkov and Ivelina Nikolova and Irina Temnikova and Irina Temnikova",
booktitle = "International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings",
publisher = "Incoma Ltd",
note = "12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 ; Conference date: 02-09-2019 Through 04-09-2019",

}

RIS

TY - GEN

T1 - Naive regularizers for low-resource neural machine translation

AU - Beloucif, Meriem

AU - Gonzalez, Ana Valeria

AU - Bollmann, Marcel

AU - Søgaard, Anders

PY - 2019

Y1 - 2019

N2 - Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.

AB - Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.

UR - http://www.scopus.com/inward/record.url?scp=85076491475&partnerID=8YFLogxK

U2 - 10.26615/978-954-452-056-4_013

DO - 10.26615/978-954-452-056-4_013

M3 - Article in proceedings

AN - SCOPUS:85076491475

SP - 102

EP - 111

BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings

A2 - Angelova, Galia

A2 - Mitkov, Ruslan

A2 - Nikolova, Ivelina

A2 - Temnikova, Irina

A2 - Temnikova, Irina

PB - Incoma Ltd

T2 - 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019

Y2 - 2 September 2019 through 4 September 2019

ER -

ID: 237806742