Crowdsourcing and annotating NER for Twitter #drift
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Crowdsourcing and annotating NER for Twitter #drift. / Fromreide, Hege; Hovy, Dirk; Søgaard, Anders.
Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014. European Language Resources Association, 2014.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Crowdsourcing and annotating NER for Twitter #drift
AU - Fromreide, Hege
AU - Hovy, Dirk
AU - Søgaard, Anders
PY - 2014
Y1 - 2014
N2 - We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.
AB - We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.
M3 - Article in proceedings
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation
PB - European Language Resources Association
ER -
ID: 105105333