Part of Speech Based Term Weighting for Information Retrieval

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Part of Speech Based Term Weighting for Information Retrieval. / Lioma, Christina; Blanco, Roi.

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval . 2009. s. 412-423.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Lioma, C & Blanco, R 2009, Part of Speech Based Term Weighting for Information Retrieval. i ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval . s. 412-423. <http://64.238.147.53/citation.cfm?id=1533720.1533768&coll=DL&dl=GUIDE&CFID=87655016&CFTOKEN=30826131>

APA

Lioma, C., & Blanco, R. (2009). Part of Speech Based Term Weighting for Information Retrieval. I ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (s. 412-423) http://64.238.147.53/citation.cfm?id=1533720.1533768&coll=DL&dl=GUIDE&CFID=87655016&CFTOKEN=30826131

Vancouver

Lioma C, Blanco R. Part of Speech Based Term Weighting for Information Retrieval. I ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval . 2009. s. 412-423

Author

Lioma, Christina ; Blanco, Roi. / Part of Speech Based Term Weighting for Information Retrieval. ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval . 2009. s. 412-423

Bibtex

@inproceedings{8fca3dafe42a4c2ea4dfd2210c53ec97,
title = "Part of Speech Based Term Weighting for Information Retrieval",
abstract = "Automatic language processing tools typically assign to terms so-called `weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the `POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.",
author = "Christina Lioma and Roi Blanco",
note = "Published in: · Proceeding ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval Pages 412 - 423 Springer-Verlag Berlin, Heidelberg {\textcopyright}2009 ISBN: 978-3-642-00957-0 doi>10.1007/978-3-642-00958-7_37",
year = "2009",
language = "English",
pages = "412--423",
booktitle = "ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval",

}

RIS

TY - GEN

T1 - Part of Speech Based Term Weighting for Information Retrieval

AU - Lioma, Christina

AU - Blanco, Roi

N1 - Published in: · Proceeding ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval Pages 412 - 423 Springer-Verlag Berlin, Heidelberg ©2009 ISBN: 978-3-642-00957-0 doi>10.1007/978-3-642-00958-7_37

PY - 2009

Y1 - 2009

N2 - Automatic language processing tools typically assign to terms so-called `weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the `POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.

AB - Automatic language processing tools typically assign to terms so-called `weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the `POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.

M3 - Article in proceedings

SP - 412

EP - 423

BT - ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval

ER -

ID: 38252017