Contextually propagated term weights for document representation

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target word. Thus, our model aims to simulate how semantic meaning is shared by words occurring in similar contexts, which is incorporated into bag-of-words document representations. Experimental evaluation in an unsupervised setting against 8 state of the art baselines shows that our model yields the best micro and macro F1 scores across datasets of increasing difficulty.

Original language	English
Title of host publication	SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Publisher	Association for Computing Machinery
Publication date	18 Jul 2019
Pages	897-900
ISBN (Electronic)	9781450361729
DOIs	https://doi.org/10.1145/3331184.3331307
Publication status	Published - 18 Jul 2019
Event	42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019 - Paris, France Duration: 21 Jul 2019 → 25 Jul 2019

Conference

Conference	42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019
Land	France
By	Paris
Periode	21/07/2019 → 25/07/2019
Sponsor	ACM SIGIR

Series	SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Research areas

Contextual semantics, Document representation, Word embeddings

Datalogisk Institut