A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. / Vania, Clara; Kementchedjhieva, Yova; Søgaard, Anders; Lopez, Adam.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. p. 1105-1116.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Vania, C, Kementchedjhieva, Y, Søgaard, A & Lopez, A 2019, A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp. 1105-1116, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 01/11/2019. https://doi.org/10.18653/v1/D19-1102

APA

Vania, C., Kementchedjhieva, Y., Søgaard, A., & Lopez, A. (2019). A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 1105-1116). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1102

Vancouver

Vania C, Kementchedjhieva Y, Søgaard A, Lopez A. A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics. 2019. p. 1105-1116 https://doi.org/10.18653/v1/D19-1102

Author

Vania, Clara ; Kementchedjhieva, Yova ; Søgaard, Anders ; Lopez, Adam. / A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. pp. 1105-1116

Bibtex

@inproceedings{e5a689f69f8e4c6a862e07c0b179b485,
title = "A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages",
abstract = "Parsers are available for only a handful of the world{\textquoteright}s languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North S{\'a}mi, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.",
author = "Clara Vania and Yova Kementchedjhieva and Anders S{\o}gaard and Adam Lopez",
year = "2019",
doi = "10.18653/v1/D19-1102",
language = "English",
pages = "1105--1116",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
publisher = "Association for Computational Linguistics",
note = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) ; Conference date: 01-11-2019 Through 01-11-2019",

}

RIS

TY - GEN

T1 - A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

AU - Vania, Clara

AU - Kementchedjhieva, Yova

AU - Søgaard, Anders

AU - Lopez, Adam

PY - 2019

Y1 - 2019

N2 - Parsers are available for only a handful of the world’s languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North Sámi, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.

AB - Parsers are available for only a handful of the world’s languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North Sámi, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.

U2 - 10.18653/v1/D19-1102

DO - 10.18653/v1/D19-1102

M3 - Article in proceedings

SP - 1105

EP - 1116

BT - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics

T2 - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Y2 - 1 November 2019 through 1 November 2019

ER -

ID: 240407654