Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the The 8th International Joint Conference on Natural Language Processing |
Forlag | Asian Federation of Natural Language Processing |
Publikationsdato | 2017 |
Sider | 295–305 |
ISBN (Trykt) | 978-1-948087-00-1 |
Status | Udgivet - 2017 |
Begivenhed | 8th International Joint Conference on Natural Language Processing - Taipei, Taiwan Varighed: 27 nov. 2017 → 1 dec. 2017 |
Konference
Konference | 8th International Joint Conference on Natural Language Processing |
---|---|
Land | Taiwan |
By | Taipei, |
Periode | 27/11/2017 → 01/12/2017 |
Links
- http://aclweb.org/anthology/I17-1000
Forlagets udgivne version
ID: 194947661