Learning a POS tagger for AAVE-like language

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Learning a POS tagger for AAVE-like language. / Jørgensen, Anna; Hovy, Dirk; Søgaard, Anders.

Proceedings of NAACL-HLT 2016. Association for Computational Linguistics, 2016. p. 1115-1120.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Jørgensen, A, Hovy, D & Søgaard, A 2016, Learning a POS tagger for AAVE-like language. in Proceedings of NAACL-HLT 2016. Association for Computational Linguistics, pp. 1115-1120, NAACL, San Diego, United States, 12/06/2016. <http://www.aclweb.org/anthology/N16-1130>

APA

Jørgensen, A., Hovy, D., & Søgaard, A. (2016). Learning a POS tagger for AAVE-like language. In Proceedings of NAACL-HLT 2016 (pp. 1115-1120). Association for Computational Linguistics. http://www.aclweb.org/anthology/N16-1130

Vancouver

Jørgensen A, Hovy D, Søgaard A. Learning a POS tagger for AAVE-like language. In Proceedings of NAACL-HLT 2016. Association for Computational Linguistics. 2016. p. 1115-1120

Author

Jørgensen, Anna ; Hovy, Dirk ; Søgaard, Anders. / Learning a POS tagger for AAVE-like language. Proceedings of NAACL-HLT 2016. Association for Computational Linguistics, 2016. pp. 1115-1120

Bibtex

@inproceedings{4c33222bd9f64c7b87e2c0c64513b123,
title = "Learning a POS tagger for AAVE-like language",
abstract = "POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.",
author = "Anna J{\o}rgensen and Dirk Hovy and Anders S{\o}gaard",
year = "2016",
language = "English",
pages = "1115--1120",
booktitle = "Proceedings of NAACL-HLT 2016",
publisher = "Association for Computational Linguistics",
note = "NAACL, NAACL ; Conference date: 12-06-2016 Through 17-06-2016",

}

RIS

TY - GEN

T1 - Learning a POS tagger for AAVE-like language

AU - Jørgensen, Anna

AU - Hovy, Dirk

AU - Søgaard, Anders

PY - 2016

Y1 - 2016

N2 - POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.

AB - POS taggers trained on newswire perform much worse on domains such as subtitles, lyrics, and tweets. In addition, these domains are very heterogeneous, and it is not clear what data to annotate to learn a POS tagger for subtitles, for example. In this paper we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English from a previously released and manually annotated Twitter corpus. Our approach is to learn from a mixture of this data and unlabeled data, which was automatically and partially labeled using mined tag dictionaries. Our POS tagger obtains a tagging accuracy of 89% on subtitles, 85% on lyrics, and 83% on tweets, with up to 55% error reductions over a state-of-the-art newswire POS tagger, and 15-25% error reductions over a state-of-the-art Twitter POS tagger.

M3 - Article in proceedings

SP - 1115

EP - 1120

BT - Proceedings of NAACL-HLT 2016

PB - Association for Computational Linguistics

T2 - NAACL

Y2 - 12 June 2016 through 17 June 2016

ER -

ID: 167551969