Learning a POS tagger for AAVE-like language

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Anna Jørgensen
Dirk Hovy
Søgaard, Anders

POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.

Originalsprog	Engelsk
Titel	Proceedings of NAACL-HLT 2016
Antal sider	6
Forlag	Association for Computational Linguistics
Publikationsdato	2016
Sider	1115-1120
ISBN (Elektronisk)	ISBN 978-1-941643-91-4
Status	Udgivet - 2016
Begivenhed	NAACL - San Diego, San Diego, USA Varighed: 12 jun. 2016 → 17 jun. 2016

Konference

Konference	NAACL
Lokation	San Diego
Land	USA
By	San Diego
Periode	12/06/2016 → 17/06/2016

Datalogisk Institut

Learning a POS tagger for AAVE-like language

Konference

Links