Quantum-inspired multimodal fusion for video sentiment analysis

Publikation: Bidrag til tidsskriftTidsskriftartikelfagfællebedømt

Dokumenter

  • Fulltext

    Indsendt manuskript, 843 KB, PDF-dokument

We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision.

OriginalsprogEngelsk
TidsskriftInformation Fusion
Vol/bind65
Sider (fra-til)58-71
ISSN1566-2535
DOI
StatusUdgivet - 2021

Bibliografisk note

Funding Information:
This study is supported by the Quantum Information Access and Retrieval Theory (QUARTZ) project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 721321 . We would like to thank Prof. Jakob Simenson for his valuable feedback.

Funding Information:
This study is supported by the Quantum Information Access and Retrieval Theory (QUARTZ) project, which has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant agreement No. 721321. We would like to thank Prof. Jakob Simenson for his valuable feedback.

Publisher Copyright:
© 2020 Elsevier B.V.

ID: 306691917