Complex-valued Neural Network-based Quantum Language Models

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Dokumenter

Fulltext
Accepteret manuskript, 1,13 MB, PDF-dokument

Peng Zhang
Wenjie Hui
Benyou Wang
Donghao Zhao
Dawei Song
Lioma, Christina
Simonsen, Jakob Grue

Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words.This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM.The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM's performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.

Originalsprog	Engelsk
Artikelnummer	84
Tidsskrift	ACM Transactions on Information Systems
Vol/bind	40
Udgave nummer	4
Antal sider	31
ISSN	1046-8188
DOI	https://doi.org/10.1145/3505138
Status	Udgivet - 2022

Bibliografisk note

Funding Information:
This article is the extended version of a conference paper [68] in AAAI 2018. This work is supported in part by the state key development program of China (grant No. 2017YFE0111900), Natural Science Foundation of China (grant No. 61772363), and the European Unions Horizon 2020 research and innovation program under the Marie SkodowskaCurie grant agreement No. 721321. Authors’ addresses: P. Zhang, W. Hui, and D. Zhao, Tianjin University, College of Intelligence and Computing, Tianjin, China; emails: {pzhang, wenjiehui, zhaodh}@tju.edu.cn; B. Wang (corresponding author), University of Padua, Padua, Italy; email: wang@dei.unipd.it; D. Song, Beijing Institute of Technology, Beijing, China; email: dawei.song2020@gmail.com; C. Lioma and J. G. Simonsen, University of Copenhagen, Copenhagen, Denmark; emails: c.lioma@di.ku.dk, simonsen@ di.ku.dk. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1046-8188/2022/03-ART84 $15.00 https://doi.org/10.1145/3505138

Publisher Copyright:
© 2022 Association for Computing Machinery.

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 339908602

Datalogisk Institut