Multi-Sense Language Modelling

Department of Computer Science

Multi-Sense Language Modelling

Research output: Contribution to journal › Journal article › Research

Standard

Multi-Sense Language Modelling. / Lekkas, Andrea; Schneider-Kamp, Peter; Augenstein, Isabelle.

In: arXiv, Vol. CyRR 2020, 2020.

Research output: Contribution to journal › Journal article › Research

Harvard

Lekkas, A, Schneider-Kamp, P & Augenstein, I 2020, 'Multi-Sense Language Modelling', arXiv, vol. CyRR 2020. https://doi.org/https://arxiv.org/abs/2012.05776

APA

Lekkas, A., Schneider-Kamp, P., & Augenstein, I. (2020). Multi-Sense Language Modelling. arXiv, CyRR 2020. https://doi.org/https://arxiv.org/abs/2012.05776

Vancouver

Lekkas A, Schneider-Kamp P, Augenstein I. Multi-Sense Language Modelling. arXiv. 2020;CyRR 2020. https://doi.org/https://arxiv.org/abs/2012.05776

Author

Lekkas, Andrea ; Schneider-Kamp, Peter ; Augenstein, Isabelle. / Multi-Sense Language Modelling. In: arXiv. 2020 ; Vol. CyRR 2020.

Bibtex

@article{7a7a29bbccff4ea3abf250cd53dcc033,

title = "Multi-Sense Language Modelling",

abstract = "The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. For sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets. ",

author = "Andrea Lekkas and Peter Schneider-Kamp and Isabelle Augenstein",

year = "2020",

doi = "https://arxiv.org/abs/2012.05776",

language = "English",

volume = "CyRR 2020",

journal = "arXiv",

publisher = "arxiv.org",

}

RIS

TY - JOUR

T1 - Multi-Sense Language Modelling

AU - Lekkas, Andrea

AU - Schneider-Kamp, Peter

AU - Augenstein, Isabelle

PY - 2020

Y1 - 2020

N2 - The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. For sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.

AB - The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. For sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.

U2 - https://arxiv.org/abs/2012.05776

DO - https://arxiv.org/abs/2012.05776

M3 - Journal article

VL - CyRR 2020

JO - arXiv

JF - arXiv

ER -

ID: 254783161