Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

  • Fernando Alva-Manchego
  • Joachim Bingel
  • Gustavo H. Paetzold
  • Carolina Scarton
  • Lucia Specia
Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
Original languageEnglish
Title of host publicationProceedings of the The 8th International Joint Conference on Natural Language Processing
PublisherAsian Federation of Natural Language Processing
Publication date2017
Pages295–305
ISBN (Print)978-1-948087-00-1
Publication statusPublished - 2017
Event8th International Joint Conference on Natural Language Processing - Taipei, Taiwan, Province of China
Duration: 27 Nov 20171 Dec 2017

Conference

Conference8th International Joint Conference on Natural Language Processing
LandTaiwan, Province of China
ByTaipei,
Periode27/11/201701/12/2017

Links

ID: 194947661