Semantic Textual Similarity of Sentences with Emojis

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Semantic Textual Similarity of Sentences with Emojis. / Debnath, Alok; Pinnaparaju, Nikhil; Shrivastava, Manish; Varma, Vasudeva; Augenstein, Isabelle.

The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020. Association for Computing Machinery, 2020. p. 426-430.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Debnath, A, Pinnaparaju, N, Shrivastava, M, Varma, V & Augenstein, I 2020, Semantic Textual Similarity of Sentences with Emojis. in The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020. Association for Computing Machinery, pp. 426-430, 29th International World Wide Web Conference, WWW 2020, Taipei, Taiwan, Province of China, 20/04/2020. https://doi.org/10.1145/3366424.3383758

APA

Debnath, A., Pinnaparaju, N., Shrivastava, M., Varma, V., & Augenstein, I. (2020). Semantic Textual Similarity of Sentences with Emojis. In The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020 (pp. 426-430). Association for Computing Machinery. https://doi.org/10.1145/3366424.3383758

Vancouver

Debnath A, Pinnaparaju N, Shrivastava M, Varma V, Augenstein I. Semantic Textual Similarity of Sentences with Emojis. In The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020. Association for Computing Machinery. 2020. p. 426-430 https://doi.org/10.1145/3366424.3383758

Author

Debnath, Alok ; Pinnaparaju, Nikhil ; Shrivastava, Manish ; Varma, Vasudeva ; Augenstein, Isabelle. / Semantic Textual Similarity of Sentences with Emojis. The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020. Association for Computing Machinery, 2020. pp. 426-430

Bibtex

@inproceedings{932ffc8a307845b292ea39db6b691ccd,

title = "Semantic Textual Similarity of Sentences with Emojis",

abstract = "In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on social media today, but are often removed in the pre-processing stage of curating datasets for NLP tasks. In this paper, we qualitatively ascertain the amount of semantic information lost by discounting emojis, as well as show a mechanism of accounting for emojis in a semantic task. We create a sentence similarity dataset of 4000 pairs of tweets with emojis, which have been annotated for relatedness. The corpus contains tweets curated based on common topic as well as by replacement of emojis. The latter was done to analyze the difference in semantics associated with different emojis. We aim to provide an understanding of the information lost by removing emojis by providing a qualitative analysis of the dataset. We also aim to present a method of using both emojis and words for downstream NLP tasks beyond sentiment analysis.",

keywords = "datasets, emoji, sentence similarity",

author = "Alok Debnath and Nikhil Pinnaparaju and Manish Shrivastava and Vasudeva Varma and Isabelle Augenstein",

year = "2020",

doi = "10.1145/3366424.3383758",

language = "English",

pages = "426--430",

booktitle = "The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020",

publisher = "Association for Computing Machinery",

note = "29th International World Wide Web Conference, WWW 2020 ; Conference date: 20-04-2020 Through 24-04-2020",

}

RIS

TY - GEN

T1 - Semantic Textual Similarity of Sentences with Emojis

AU - Debnath, Alok

AU - Pinnaparaju, Nikhil

AU - Shrivastava, Manish

AU - Varma, Vasudeva

AU - Augenstein, Isabelle

PY - 2020

Y1 - 2020

N2 - In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on social media today, but are often removed in the pre-processing stage of curating datasets for NLP tasks. In this paper, we qualitatively ascertain the amount of semantic information lost by discounting emojis, as well as show a mechanism of accounting for emojis in a semantic task. We create a sentence similarity dataset of 4000 pairs of tweets with emojis, which have been annotated for relatedness. The corpus contains tweets curated based on common topic as well as by replacement of emojis. The latter was done to analyze the difference in semantics associated with different emojis. We aim to provide an understanding of the information lost by removing emojis by providing a qualitative analysis of the dataset. We also aim to present a method of using both emojis and words for downstream NLP tasks beyond sentiment analysis.

AB - In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on social media today, but are often removed in the pre-processing stage of curating datasets for NLP tasks. In this paper, we qualitatively ascertain the amount of semantic information lost by discounting emojis, as well as show a mechanism of accounting for emojis in a semantic task. We create a sentence similarity dataset of 4000 pairs of tweets with emojis, which have been annotated for relatedness. The corpus contains tweets curated based on common topic as well as by replacement of emojis. The latter was done to analyze the difference in semantics associated with different emojis. We aim to provide an understanding of the information lost by removing emojis by providing a qualitative analysis of the dataset. We also aim to present a method of using both emojis and words for downstream NLP tasks beyond sentiment analysis.

KW - datasets

KW - emoji

KW - sentence similarity

UR - http://www.scopus.com/inward/record.url?scp=85091704665&partnerID=8YFLogxK

U2 - 10.1145/3366424.3383758

DO - 10.1145/3366424.3383758

M3 - Article in proceedings

AN - SCOPUS:85091704665

SP - 426

EP - 430

BT - The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020

PB - Association for Computing Machinery

T2 - 29th International World Wide Web Conference, WWW 2020

Y2 - 20 April 2020 through 24 April 2020

ER -

ID: 250434538

Department of Computer Science