本文已被:浏览 401次 下载 1220次
Received:August 01, 2023 Revised:September 01, 2023
Received:August 01, 2023 Revised:September 01, 2023
中文摘要: 方面级多模态情感分类任务的一个关键是从文本和视觉两种不同模态中准确地提取和融合互补信息, 以检测文本中提及的方面词的情感倾向. 现有的方法大多数只利用单一的上下文信息结合图片信息来分析, 存在对方面和上下文信息、视觉信息的相关性的识别不敏感, 对视觉中的方面相关信息的局部提取不够精准等问题, 此外, 在进行特征融合时, 部分模态信息不全会导致融合效果一般. 针对上述问题, 本文提出一种注意力融合网络AF-Net模型去进行方面级多模态情感分类, 利用空间变换网络STN学习图像中目标的位置信息来帮助提取重要的局部特征; 利用基于Transformer的交互网络对方面和文本以及图像之间的关系进行建模, 实现多模态交互; 同时补充了不同模态特征间的相似信息以及使用多头注意力机制融合多特征信息, 表征出多模态信息, 最后通过Softmax层取得情感分类的结果. 在两个基准数据集上进行实验和对比, 结果表明AF-Net能获得较好的性能, 提升方面级多模态情感分类的效果.
Abstract:One of the key tasks of aspect-level multimodal sentiment classification is to accurately extract and fuse complementary information from two different modals of text and vision, so as to detect the sentiment orientation of the aspect words mentioned in the text. Most of the existing methods only use single context information combined with image information for analysis, revealing the problems such as insensitive to the recognition of the correlation between aspect-, context- and visual-information, and imprecise in local extraction of aspect-related information in vision. In addition, when performing feature fusion, insufficient partial modal information will lead to mediocre fusion effect. To solve the above problems, an attention fusion network AF-Net model is proposed to perform aspect-level multimodal sentiment classification in this study. The spatial transformation network (STN) is used to learn the location information of objects in images to help extract important local features. The Transformer based interaction network is used to model the relationship between aspects, texts and images, and realize multi-modal interaction. At the same time, the similar information between different modal features is supplemented and the multi-feature information is fused by multi-attention mechanism to represent the multi-modal information. Finally, the result of sentiment classification is obtained through Softmax layer. Experiments and comparisons carried out on the two benchmark datasets show that AF-Net can achieve better performance and improve the effect of aspect-level multimodal sentiment classification.
keywords: multimodal sentiment classification spatial transformation network (STN) interaction network similar information attention fusion network
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61070015)
引用文本:
冼广铭,招志锋,阳先平.基于注意力融合网络的方面级多模态情感分类.计算机系统应用,2024,33(2):94-104
XIAN Guang-Ming,ZHAO Zhi-Feng,YANG Xian-Ping.Aspect-level Multimodal Sentiment Classification Based on Attention Fusion Network.COMPUTER SYSTEMS APPLICATIONS,2024,33(2):94-104
冼广铭,招志锋,阳先平.基于注意力融合网络的方面级多模态情感分类.计算机系统应用,2024,33(2):94-104
XIAN Guang-Ming,ZHAO Zhi-Feng,YANG Xian-Ping.Aspect-level Multimodal Sentiment Classification Based on Attention Fusion Network.COMPUTER SYSTEMS APPLICATIONS,2024,33(2):94-104