Examining the content load of part of speech blocks for information retrieval

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Examining the content load of part of speech blocks for information retrieval. / Lioma, Christina; Ounis, Iadh.

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference. Association for Computational Linguistics, 2006. s. 531-538.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Lioma, C & Ounis, I 2006, Examining the content load of part of speech blocks for information retrieval. i COLING-ACL '06 Proceedings of the COLING/ACL on Main conference. Association for Computational Linguistics, s. 531-538. <http://dl.acm.org/citation.cfm?id=1273142>

APA

Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. I COLING-ACL '06 Proceedings of the COLING/ACL on Main conference (s. 531-538). Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1273142

Vancouver

Lioma C, Ounis I. Examining the content load of part of speech blocks for information retrieval. I COLING-ACL '06 Proceedings of the COLING/ACL on Main conference. Association for Computational Linguistics. 2006. s. 531-538

Author

Lioma, Christina ; Ounis, Iadh. / Examining the content load of part of speech blocks for information retrieval. COLING-ACL '06 Proceedings of the COLING/ACL on Main conference. Association for Computational Linguistics, 2006. s. 531-538

Bibtex

@inproceedings{0b48fcbca61348908b1109cb3cb6d638,
title = "Examining the content load of part of speech blocks for information retrieval",
abstract = "We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks, on the basis that open class parts of speech are more content-bearing than closed class parts of speech. We test these hypotheses in the context of Information Retrieval, by syntactically representing queries, and removing from them content-poor blocks, in line with the aforementioned hypotheses. For our first hypothesis, we induce POS distribution information from a corpus, and approximate the probability of occurrence of POS blocks as per two statistical estimators separately. For our second hypothesis, we use simple heuristics to estimate the content load within POS blocks. We use the Text REtrieval Conference (TREC) queries of 1999 and 2000 to retrieve documents from the WT2G and WT10G test collections, with five different retrieval strategies. Experimental outcomes confirm that our hypotheses hold in the context of Information Retrieval.",
author = "Christina Lioma and Iadh Ounis",
note = "Sydney, July 2006. c 2006 Association for Computational Linguistics ",
year = "2006",
language = "English",
pages = "531--538",
booktitle = "COLING-ACL '06 Proceedings of the COLING/ACL on Main conference",
publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Examining the content load of part of speech blocks for information retrieval

AU - Lioma, Christina

AU - Ounis, Iadh

N1 - Sydney, July 2006. c 2006 Association for Computational Linguistics

PY - 2006

Y1 - 2006

N2 - We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks, on the basis that open class parts of speech are more content-bearing than closed class parts of speech. We test these hypotheses in the context of Information Retrieval, by syntactically representing queries, and removing from them content-poor blocks, in line with the aforementioned hypotheses. For our first hypothesis, we induce POS distribution information from a corpus, and approximate the probability of occurrence of POS blocks as per two statistical estimators separately. For our second hypothesis, we use simple heuristics to estimate the content load within POS blocks. We use the Text REtrieval Conference (TREC) queries of 1999 and 2000 to retrieve documents from the WT2G and WT10G test collections, with five different retrieval strategies. Experimental outcomes confirm that our hypotheses hold in the context of Information Retrieval.

AB - We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks, on the basis that open class parts of speech are more content-bearing than closed class parts of speech. We test these hypotheses in the context of Information Retrieval, by syntactically representing queries, and removing from them content-poor blocks, in line with the aforementioned hypotheses. For our first hypothesis, we induce POS distribution information from a corpus, and approximate the probability of occurrence of POS blocks as per two statistical estimators separately. For our second hypothesis, we use simple heuristics to estimate the content load within POS blocks. We use the Text REtrieval Conference (TREC) queries of 1999 and 2000 to retrieve documents from the WT2G and WT10G test collections, with five different retrieval strategies. Experimental outcomes confirm that our hypotheses hold in the context of Information Retrieval.

M3 - Article in proceedings

SP - 531

EP - 538

BT - COLING-ACL '06 Proceedings of the COLING/ACL on Main conference

PB - Association for Computational Linguistics

ER -

ID: 38251887