Semantic Textual Similarity of Sentences with Emojis

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Semantic Textual Similarity of Sentences with Emojis
Final published version, 600 KB, PDF document

Alok Debnath
Nikhil Pinnaparaju
Manish Shrivastava
Vasudeva Varma
Augenstein, Isabelle

In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on social media today, but are often removed in the pre-processing stage of curating datasets for NLP tasks. In this paper, we qualitatively ascertain the amount of semantic information lost by discounting emojis, as well as show a mechanism of accounting for emojis in a semantic task. We create a sentence similarity dataset of 4000 pairs of tweets with emojis, which have been annotated for relatedness. The corpus contains tweets curated based on common topic as well as by replacement of emojis. The latter was done to analyze the difference in semantics associated with different emojis. We aim to provide an understanding of the information lost by removing emojis by providing a qualitative analysis of the dataset. We also aim to present a method of using both emojis and words for downstream NLP tasks beyond sentiment analysis.

Original language	English
Title of host publication	The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
Publisher	Association for Computing Machinery
Publication date	2020
Pages	426-430
ISBN (Electronic)	9781450370240
DOIs	https://doi.org/10.1145/3366424.3383758
Publication status	Published - 2020
Event	29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China Duration: 20 Apr 2020 → 24 Apr 2020

Conference

Conference	29th International World Wide Web Conference, WWW 2020
Land	Taiwan, Province of China
By	Taipei
Periode	20/04/2020 → 24/04/2020
Sponsor	Chunghwa Telecom, et al., Microsoft, Quanta Computer, Taiwan Mobile, ZOOM

Research areas

datasets, emoji, sentence similarity

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 250434538

Department of Computer Science