Image Captioning with Similar Temporal Attention Mechanism

doi:10.15888/j.cnki.csa.007996

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-25- 8

Home > Archive>Volume 30, Issue 7, 2021 >232-238. DOI:10.15888/j.cnki.csa.007996

PDF HTML XML Export Cite reminder

Image Captioning with Similar Temporal Attention Mechanism
DOI:
                        10.15888/j.cnki.csa.007996
                    
CSTR:
                        [cstr]
                    
Author:
                        DUAN Hai-LongDUAN Hai-Long
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WU Chun-LeiWU Chun-Lei
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WANG Lei-QuanWANG Lei-Quan
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Recently, attention mechanisms have been widely used in computer vision in such aspects as the common encoder/decoder framework for image captioning. However, the current decoding framework does not clearly analyze the correlation between image features and the hidden states of the Long Short-Term Memory (LSTM) network, leading to cumulative errors. In this study, we propose a Similar Temporal Attention Network (STAN) that extends conventional attention mechanisms to strengthen the correlation between attention results and hidden states at different moments. STAN first applies attention to the hidden state and feature vector at the current moment, and then introduces the attention result of two adjacent LSTM segments into the recurrent LSTM network at the next moment through an Attention Fusion Slot (AFS) to enhance the correlation between attention results and hidden states. Also, we design a Hidden State Switch (HSS) to guide the generation of words, which is combined with the AFS to reduce cumulative errors. According to the extensive experiments on the public benchmark dataset Microsoft COCO and various evaluation mechanisms, our algorithm is superior to the baseline model and can get more competitive attention results.

Key words:image captioning;attention mechanism;similar temporal attention;LSTM network

Get Citation

段海龙,吴春雷,王雷全.基于类时序注意力机制的图像描述方法.计算机系统应用,2021,30(7):232-238

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 01,2020
Revised:December 02,2020
Adopted:
Online: July 02,2021
Published:

Article QR Code

You are the first992247Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063