Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. / Felbo, Bjarke ; Mislove, Alan ; Søgaard, Anders; Rahwan, Iyad ; Lehmann, Sune.

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. p. 615–1625.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Felbo, B, Mislove, A, Søgaard, A, Rahwan, I & Lehmann, S 2017, Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 615–1625, 2017 Conference on Empirical Methods in Natural Language Processing, Copemhagen, Denmark, 09/09/2017. <http://www.aclweb.org/anthology/D17-1169>

APA

Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 615–1625). Association for Computational Linguistics. http://www.aclweb.org/anthology/D17-1169

Vancouver

Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2017. p. 615–1625

Author

Felbo, Bjarke ; Mislove, Alan ; Søgaard, Anders ; Rahwan, Iyad ; Lehmann, Sune. / Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. pp. 615–1625

Bibtex

@inproceedings{542866c661164f09a7e945de5de6144f,

title = "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm",

abstract = "NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.",

author = "Bjarke Felbo and Alan Mislove and Anders S{\o}gaard and Iyad Rahwan and Sune Lehmann",

year = "2017",

language = "English",

pages = "615–1625",

booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics",

note = "2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 ; Conference date: 09-09-2017 Through 11-09-2017",

}

RIS

TY - GEN

T1 - Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

AU - Felbo, Bjarke

AU - Mislove, Alan

AU - Søgaard, Anders

AU - Rahwan, Iyad

AU - Lehmann, Sune

PY - 2017

Y1 - 2017

N2 - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.

AB - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.

M3 - Article in proceedings

SP - 615

EP - 1625

BT - Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics

T2 - 2017 Conference on Empirical Methods in Natural Language Processing

Y2 - 9 September 2017 through 11 September 2017

ER -

ID: 195015150

Department of Computer Science