Evaluating hypotheses in geolocation on a very large sample of Twitter

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Evaluating hypotheses in geolocation on a very large sample of Twitter. / Salehi, Bahar; Søgaard, Anders.

Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, 2017. p. 62-67.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Salehi, B & Søgaard, A 2017, Evaluating hypotheses in geolocation on a very large sample of Twitter. in Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, pp. 62-67, 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark, 07/09/2017. <http://www.aclweb.org/anthology/W17-4409>

APA

Salehi, B., & Søgaard, A. (2017). Evaluating hypotheses in geolocation on a very large sample of Twitter. In Proceedings of the 3rd Workshop on Noisy User-generated Text (pp. 62-67). Association for Computational Linguistics. http://www.aclweb.org/anthology/W17-4409

Vancouver

Salehi B, Søgaard A. Evaluating hypotheses in geolocation on a very large sample of Twitter. In Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics. 2017. p. 62-67

Author

Salehi, Bahar ; Søgaard, Anders. / Evaluating hypotheses in geolocation on a very large sample of Twitter. Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, 2017. pp. 62-67

Bibtex

@inproceedings{827907476947411a8ab6ea6190bf63ce,
title = "Evaluating hypotheses in geolocation on a very large sample of Twitter",
abstract = "Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.",
author = "Bahar Salehi and Anders S{\o}gaard",
year = "2017",
language = "English",
isbn = "978-1-945626-94-4",
pages = "62--67",
booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text",
publisher = "Association for Computational Linguistics",
note = "3rd Workshop on Noisy User-generated Text, W-NUT 2017 ; Conference date: 07-09-2017 Through 07-09-2017",

}

RIS

TY - GEN

T1 - Evaluating hypotheses in geolocation on a very large sample of Twitter

AU - Salehi, Bahar

AU - Søgaard, Anders

PY - 2017

Y1 - 2017

N2 - Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.

AB - Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.

M3 - Article in proceedings

SN - 978-1-945626-94-4

SP - 62

EP - 67

BT - Proceedings of the 3rd Workshop on Noisy User-generated Text

PB - Association for Computational Linguistics

T2 - 3rd Workshop on Noisy User-generated Text

Y2 - 7 September 2017 through 7 September 2017

ER -

ID: 195014345