Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
NLP tasks are often limited by scarcity of
manually annotated data. In social media
sentiment analysis and related tasks,
researchers have therefore used binarized
emoticons and specific hashtags as forms
of distant supervision. Our paper shows
that by extending the distant supervision
to a more diverse set of noisy labels, the
models can learn richer representations.
Through emoji prediction on a dataset of
1246 million tweets containing one of 64
common emojis we obtain state-of-theart
performance on 8 benchmark datasets
within emotion, sentiment and sarcasm detection
using a single pretrained model.
Our analyses confirm that the diversity of
our emotional labels yield a performance
improvement over previous distant supervision
approaches.
manually annotated data. In social media
sentiment analysis and related tasks,
researchers have therefore used binarized
emoticons and specific hashtags as forms
of distant supervision. Our paper shows
that by extending the distant supervision
to a more diverse set of noisy labels, the
models can learn richer representations.
Through emoji prediction on a dataset of
1246 million tweets containing one of 64
common emojis we obtain state-of-theart
performance on 8 benchmark datasets
within emotion, sentiment and sarcasm detection
using a single pretrained model.
Our analyses confirm that the diversity of
our emotional labels yield a performance
improvement over previous distant supervision
approaches.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing |
Antal sider | 11 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2017 |
Sider | 615–1625 |
Status | Udgivet - 2017 |
Begivenhed | 2017 Conference on Empirical Methods in Natural Language Processing - Copemhagen, Danmark Varighed: 9 sep. 2017 → 11 sep. 2017 |
Konference
Konference | 2017 Conference on Empirical Methods in Natural Language Processing |
---|---|
Land | Danmark |
By | Copemhagen |
Periode | 09/09/2017 → 11/09/2017 |
Links
- http://www.aclweb.org/anthology/D17-1169
Forlagets udgivne version
ID: 195015150