Multimodal Sentiment Analysis Based on Dual Encoder Representation Learning

doi:10.15888/j.cnki.csa.009461

WeChat

Mobile website

Home > Archive>Volume 33, Issue 4, 2024 >13-25. DOI:10.15888/j.cnki.csa.009461

PDF HTML XML Export Cite reminder

Multimodal Sentiment Analysis Based on Dual Encoder Representation Learning
DOI:
                        10.15888/j.cnki.csa.009461
                    
CSTR:
                        [cstr]
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.

Reference

Cited by

Get Citation

冼广铭,阳先平,招志锋.基于双编码器表示学习的多模态情感分析.计算机系统应用,2024,33(4):13-25

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 15,2023
Revised:November 15,2023
Adopted:
Online: January 30,2024
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063