Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Fernando Alva-Manchego
Joachim Bingel
Gustavo H. Paetzold
Carolina Scarton
Lucia Specia

Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model

Originalsprog	Engelsk
Titel	Proceedings of the The 8th International Joint Conference on Natural Language Processing
Forlag	Asian Federation of Natural Language Processing
Publikationsdato	2017
Sider	295–305
ISBN (Trykt)	978-1-948087-00-1
Status	Udgivet - 2017
Begivenhed	8th International Joint Conference on Natural Language Processing - Taipei, Taiwan Varighed: 27 nov. 2017 → 1 dec. 2017

Konference

Konference	8th International Joint Conference on Natural Language Processing
Land	Taiwan
By	Taipei,
Periode	27/11/2017 → 01/12/2017

Datalogisk Institut

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Konference

Links