Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. / Alva-Manchego, Fernando ; Bingel, Joachim; Paetzold, Gustavo H. ; Scarton, Carolina ; Specia, Lucia .

Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. p. 295–305.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Alva-Manchego, F, Bingel, J, Paetzold, GH, Scarton, C & Specia, L 2017, Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. in Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, pp. 295–305, 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, Province of China, 27/11/2017. <http://aclweb.org/anthology/I17-1000>

APA

Alva-Manchego, F., Bingel, J., Paetzold, G. H., Scarton, C., & Specia, L. (2017). Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. In Proceedings of the The 8th International Joint Conference on Natural Language Processing (pp. 295–305). Asian Federation of Natural Language Processing. http://aclweb.org/anthology/I17-1000

Vancouver

Alva-Manchego F, Bingel J, Paetzold GH, Scarton C, Specia L. Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. In Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing. 2017. p. 295–305

Author

Alva-Manchego, Fernando ; Bingel, Joachim ; Paetzold, Gustavo H. ; Scarton, Carolina ; Specia, Lucia . / Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. pp. 295–305

Bibtex

@inproceedings{af42fe80f82f4611bf25f279e59df893,

title = "Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs",

abstract = "Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model",

author = "Fernando Alva-Manchego and Joachim Bingel and Paetzold, {Gustavo H.} and Carolina Scarton and Lucia Specia",

year = "2017",

language = "English",

isbn = "978-1-948087-00-1",

pages = "295–305",

booktitle = "Proceedings of the The 8th International Joint Conference on Natural Language Processing",

publisher = "Asian Federation of Natural Language Processing",

note = "8th International Joint Conference on Natural Language Processing ; Conference date: 27-11-2017 Through 01-12-2017",

}

RIS

TY - GEN

T1 - Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

AU - Alva-Manchego, Fernando

AU - Bingel, Joachim

AU - Paetzold, Gustavo H.

AU - Scarton, Carolina

AU - Specia, Lucia

PY - 2017

Y1 - 2017

N2 - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model

AB - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model

M3 - Article in proceedings

SN - 978-1-948087-00-1

SP - 295

EP - 305

BT - Proceedings of the The 8th International Joint Conference on Natural Language Processing

PB - Asian Federation of Natural Language Processing

T2 - 8th International Joint Conference on Natural Language Processing

Y2 - 27 November 2017 through 1 December 2017

ER -

ID: 194947661

Department of Computer Science