Evaluating hypotheses in geolocation on a very large sample of Twitter

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Bahar Salehi
Søgaard, Anders

Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.

Originalsprog	Engelsk
Titel	Proceedings of the 3rd Workshop on Noisy User-generated Text
Antal sider	6
Forlag	Association for Computational Linguistics
Publikationsdato	2017
Sider	62-67
ISBN (Trykt)	978-1-945626-94-4
Status	Udgivet - 2017
Begivenhed	3rd Workshop on Noisy User-generated Text - Copenhagen, Danmark Varighed: 7 sep. 2017 → 7 sep. 2017

Konference

Konference	3rd Workshop on Noisy User-generated Text
Land	Danmark
By	Copenhagen
Periode	07/09/2017 → 07/09/2017

Datalogisk Institut

Evaluating hypotheses in geolocation on a very large sample of Twitter

Konference

Links