Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. / Alva-Manchego, Fernando ; Bingel, Joachim; Paetzold, Gustavo H. ; Scarton, Carolina ; Specia, Lucia .
Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. p. 295–305.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
AU - Alva-Manchego, Fernando
AU - Bingel, Joachim
AU - Paetzold, Gustavo H.
AU - Scarton, Carolina
AU - Specia, Lucia
PY - 2017
Y1 - 2017
N2 - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model
AB - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model
M3 - Article in proceedings
SN - 978-1-948087-00-1
SP - 295
EP - 305
BT - Proceedings of the The 8th International Joint Conference on Natural Language Processing
PB - Asian Federation of Natural Language Processing
T2 - 8th International Joint Conference on Natural Language Processing
Y2 - 27 November 2017 through 1 December 2017
ER -
ID: 194947661