Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

  • Fernando Alva-Manchego
  • Joachim Bingel
  • Gustavo H. Paetzold
  • Carolina Scarton
  • Lucia Specia
Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
OriginalsprogEngelsk
TitelProceedings of the The 8th International Joint Conference on Natural Language Processing
ForlagAsian Federation of Natural Language Processing
Publikationsdato2017
Sider295–305
ISBN (Trykt)978-1-948087-00-1
StatusUdgivet - 2017
Begivenhed8th International Joint Conference on Natural Language Processing - Taipei, Taiwan
Varighed: 27 nov. 20171 dec. 2017

Konference

Konference8th International Joint Conference on Natural Language Processing
LandTaiwan
ByTaipei,
Periode27/11/201701/12/2017

Links

ID: 194947661