Quantum-inspired multimodal fusion for video sentiment analysis
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Quantum-inspired multimodal fusion for video sentiment analysis. / Li, Qiuchi; Gkoumas, Dimitris; Lioma, Christina; Melucci, Massimo.
In: Information Fusion, Vol. 65, 2021, p. 58-71.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Quantum-inspired multimodal fusion for video sentiment analysis
AU - Li, Qiuchi
AU - Gkoumas, Dimitris
AU - Lioma, Christina
AU - Melucci, Massimo
N1 - Publisher Copyright: © 2020 Elsevier B.V.
PY - 2021
Y1 - 2021
N2 - We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision.
AB - We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision.
KW - Machine learning
KW - Multimodal sentiment analysis
KW - Quantum theory
U2 - 10.1016/j.inffus.2020.08.006
DO - 10.1016/j.inffus.2020.08.006
M3 - Journal article
AN - SCOPUS:85089582013
VL - 65
SP - 58
EP - 71
JO - Information Fusion
JF - Information Fusion
SN - 1566-2535
ER -
ID: 306691917