基于对称注意力机制的视觉问答系统

doi:10.15888/j.cnki.csa.007925

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 3:53 星期四

首页 > 过刊浏览>2021年第30卷第5期 >114-119. DOI:10.15888/j.cnki.csa.007925

PDF HTML阅读 XML下载导出引用引用提醒

基于对称注意力机制的视觉问答系统
DOI:
                        10.15888/j.cnki.csa.007925
                    
CSTR:
                        
                    
作者:
                        路静路静
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
吴春雷吴春雷
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
王雷全王雷全
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山东省重点研发计划(2019GGX101015); 中央高校自主创新科研计划(20CX05018A, 18CX02136A)

Visual Question Answering with Symmetrical Attention Mechanism

Author:

LU Jing
LU Jing
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
WU Chun-Lei
WU Chun-Lei
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Lei-Quan
WANG Lei-Quan
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来, 基于图像视觉特征与问题文本特征融合的视觉问答(VQA)引起了研究者们的广泛关注. 现有的大部分模型都是通过聚集图像区域和疑问词对的相似性, 采用注意力机制和密集迭代操作进行细粒度交互和匹配, 忽略了图像区域和问题词的自相关信息. 本文提出了一种基于对称注意力机制的模型架构, 能够有效利用图片和问题之间具有的语义关联, 进而减少整体语义理解上的偏差, 以提高答案预测的准确性. 本文在VQA2.0数据集上进行了实验, 实验结果表明基于对称注意力机制的模型与基线模型相比具有明显的优越性.

关键词:视觉问答;注意力机制;对称注意力;卷积神经网络;特征提取

Abstract:

In recent years, Visual Question Answering (VQA) based on the fusion of image visual features and question text features has attracted wide attention from researchers. Most of the existing models enable fine-grained interaction and matching by the attention mechanism and intensive iterative operations according to the similarity of image regions and question word pairs, thereby ignoring the autocorrelation information of image regions and question words. This paper introduces a model based on a symmetrical attention mechanism. It can effectively reduce the overall semantic deviation by analyzing the semantic association between images and questions, improving the accuracy of answer prediction. Experiments are conducted on the VQA2.0 data set, and results prove that the proposed model based on the symmetric attention mechanism has evident advantages over the baseline model.

Key words:Visual Question Answering (VQA);attention mechanism;symmetrical attention;Convolutional Neural Network (CNN);feature extraction

引用本文

路静,吴春雷,王雷全.基于对称注意力机制的视觉问答系统.计算机系统应用,2021,30(5):114-119

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-09-15
最后修改日期:2020-10-13
录用日期:
在线发布日期: 2021-05-06
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码