基于对称注意力机制的视觉问答系统
作者:
基金项目:

山东省重点研发计划(2019GGX101015); 中央高校自主创新科研计划(20CX05018A, 18CX02136A)


Visual Question Answering with Symmetrical Attention Mechanism
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [20]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    近年来, 基于图像视觉特征与问题文本特征融合的视觉问答(VQA)引起了研究者们的广泛关注. 现有的大部分模型都是通过聚集图像区域和疑问词对的相似性, 采用注意力机制和密集迭代操作进行细粒度交互和匹配, 忽略了图像区域和问题词的自相关信息. 本文提出了一种基于对称注意力机制的模型架构, 能够有效利用图片和问题之间具有的语义关联, 进而减少整体语义理解上的偏差, 以提高答案预测的准确性. 本文在VQA2.0数据集上进行了实验, 实验结果表明基于对称注意力机制的模型与基线模型相比具有明显的优越性.

    Abstract:

    In recent years, Visual Question Answering (VQA) based on the fusion of image visual features and question text features has attracted wide attention from researchers. Most of the existing models enable fine-grained interaction and matching by the attention mechanism and intensive iterative operations according to the similarity of image regions and question word pairs, thereby ignoring the autocorrelation information of image regions and question words. This paper introduces a model based on a symmetrical attention mechanism. It can effectively reduce the overall semantic deviation by analyzing the semantic association between images and questions, improving the accuracy of answer prediction. Experiments are conducted on the VQA2.0 data set, and results prove that the proposed model based on the symmetric attention mechanism has evident advantages over the baseline model.

    参考文献
    [1] 袁韶祖, 王雷全, 吴春雷. 基于多粒度视频信息和注意力机制的视频场景识别. 计算机系统应用, 2020, 29(5): 252-256. [doi: 10.15888/j.cnki.csa.007410
    [2] Cha M, Gwon YL, Kung HT. Adversarial learning of semantic relevance in text to image synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 3272-3279. [doi: 10.1609/aaai.v33i01.33013272
    [3] Gao P, Jiang Z, You H, et al. Dynamic fusion with intra- and inter-modality attention flow for visual question answering. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA. 2019. 6632-6641.
    [4] Nguyen DK, Okatani T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 6087-6096.
    [5] Mohammadi S, Majelan SG, Shokouhi SB. Ensembles of deep neural networks for action recognition in still images. Proceedings of 2019 9th International Conference on Computer and Knowledge Engineering. Mashhad, Iran. 2019. 315-318.
    [6] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. 580-587.
    [7] Vedantam R, Zitnick CL, Parikh D. CIDEr: Consensus-based image description evaluation. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 4566-4575.
    [8] He HH. The parallel corpus for information extraction based on natural language processing and machine translation. Expert Systems, 2019, 36(5): e12349. [doi: 10.1111/exsy.12349
    [9] Ge HW, Yan ZH, Zhang K, et al. Exploring overall contextual information for image captioning in human-like cognitive style. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 1754-1763.
    [10] Andreas J, Rohrbach M, Darrell T, et al. Neural module networks. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 39-48.
    [11] Yang ZC, He XD, Gao JF, et al. Stacked attention networks for image question answering. Proceedings of 2016IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 21-29.
    [12] Xu HJ, Saenko K. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. Proceedings of 14th European Conference on Computer Vision. Amsterdam, the Netherlands. 2016. 451-466.
    [13] Shih KJ, Singh S, Hoiem D. Where to look: Focus regions for visual question answering. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 4613-4621.
    [14] Xiong CM, Merity S, Socher R. Dynamic memory networks for visual and textual question answering. Proceedings of the 33rd International Conference on Machine Learning. New York City, NY, USA. 2016. 2397-2406.
    [15] Agrawal A, Lu JS, Antol S, et al. VQA: Visual question answering. International Journal of Computer Vision, 2017, 123(1): 4-31. [doi: 10.1007/s11263-016-0966-6
    [16] Goyal Y, Khot T, Summers-Stay D, et al. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 6904-6913.
    [17] Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. Proceedings of European Conference on Computer Vision. Zurich, Switzerland. 2014. 740-755.
    [18] Fukui A, Park DH, Yang D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA. 2016. 457-468.
    [19] Chen Z, Zhao YP, Huang SY, et al. Structured attentions for visual question answering. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 1291-1300.
    [20] Teney D, Anderson P, He XD, et al. Tips and tricks for visual question answering: Learnings from the 2017 challenge. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 4223-4232.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

路静,吴春雷,王雷全.基于对称注意力机制的视觉问答系统.计算机系统应用,2021,30(5):114-119

复制
分享
文章指标
  • 点击次数:1038
  • 下载次数: 2097
  • HTML阅读次数: 1587
  • 引用次数: 0
历史
  • 收稿日期:2020-09-15
  • 最后修改日期:2020-10-13
  • 在线发布日期: 2021-05-06
文章二维码
您是第11205711位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号