Learning a POS tagger for AAVE-like language
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Learning a POS tagger for AAVE-like language. / Jørgensen, Anna; Hovy, Dirk; Søgaard, Anders.
Proceedings of NAACL-HLT 2016. Association for Computational Linguistics, 2016. p. 1115-1120.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Learning a POS tagger for AAVE-like language
AU - Jørgensen, Anna
AU - Hovy, Dirk
AU - Søgaard, Anders
PY - 2016
Y1 - 2016
N2 - POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.
AB - POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.
M3 - Article in proceedings
SP - 1115
EP - 1120
BT - Proceedings of NAACL-HLT 2016
PB - Association for Computational Linguistics
T2 - NAACL
Y2 - 12 June 2016 through 17 June 2016
ER -
ID: 167551969