Learning attention for historical text normalization by learning to pronounce

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Learning attention for historical text normalization by learning to pronounce. / Bollmann, Marcel; Bingel, Joachim; Søgaard, Anders.

ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics, 2017. p. 332-344.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Bollmann, M, Bingel, J & Søgaard, A 2017, Learning attention for historical text normalization by learning to pronounce. in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics, pp. 332-344, 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, 30/07/2017. https://doi.org/10.18653/v1/P17-1031

APA

Bollmann, M., Bingel, J., & Søgaard, A. (2017). Learning attention for historical text normalization by learning to pronounce. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (pp. 332-344). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1031

Vancouver

Bollmann M, Bingel J, Søgaard A. Learning attention for historical text normalization by learning to pronounce. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics. 2017. p. 332-344 https://doi.org/10.18653/v1/P17-1031

Author

Bollmann, Marcel ; Bingel, Joachim ; Søgaard, Anders. / Learning attention for historical text normalization by learning to pronounce. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics, 2017. pp. 332-344

Bibtex

@inproceedings{313c4646efbc4870b2ddc14669b15b12,

title = "Learning attention for historical text normalization by learning to pronounce",

abstract = "Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.",

author = "Marcel Bollmann and Joachim Bingel and Anders S{\o}gaard",

year = "2017",

month = jan,

day = "1",

doi = "10.18653/v1/P17-1031",

language = "English",

pages = "332--344",

booktitle = "ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",

publisher = "Association for Computational Linguistics",

note = "55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 ; Conference date: 30-07-2017 Through 04-08-2017",

}

RIS

TY - GEN

T1 - Learning attention for historical text normalization by learning to pronounce

AU - Bollmann, Marcel

AU - Bingel, Joachim

AU - Søgaard, Anders

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

AB - Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

UR - http://www.scopus.com/inward/record.url?scp=85040931354&partnerID=8YFLogxK

U2 - 10.18653/v1/P17-1031

DO - 10.18653/v1/P17-1031

M3 - Article in proceedings

AN - SCOPUS:85040931354

SP - 332

EP - 344

BT - ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

PB - Association for Computational Linguistics

T2 - 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017

Y2 - 30 July 2017 through 4 August 2017

ER -

ID: 194945376

Department of Computer Science