Image Caption Generation Model Based on Convolutional Block Attention Module

doi:10.15888/j.cnki.csa.008043

AIPUB归智期刊联盟

WeChat

Mobile website

2025-7-28- 14

Home > Archive>Volume 30, Issue 8, 2021 >194-200. DOI:10.15888/j.cnki.csa.008043

PDF HTML XML Export Cite reminder

Image Caption Generation Model Based on Convolutional Block Attention Module
DOI:
                        10.15888/j.cnki.csa.008043
                    
CSTR:
                        [cstr]
                    
Author:
                        YU Hai-BoYU Hai-Bo
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China;Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
CHEN Jin-GuangCHEN Jin-Guang
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China;Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The image caption generation model uses natural language to describe the content of images and the relationship between attributes. In the existing models, there are problems of low description quality, insufficient feature extraction of important parts of images, and high complexity. Therefore, this study proposes an image caption generation model based on a Convolutional Block Attention Module (CBAM), which has an encoder-decoder structure. CBAM is added into the feature extraction network Inception-v4 and as an encoder, extracts the important feature information of the images. The information is then sent into the Long Short-Term Memory (LSTM) of the decoder to generate the caption of the corresponding pictures. The MSCOCO2014 data set is applied to training and testing, and multiple evaluation criteria are used to evaluate the accuracy of the model. The experimental results show that the improved model has a higher evaluation criterion score than other models, and Model2 can better extract image features and generate a more accurate description.

Key words:image caption generation;Convolutional Block Attention Module (CBAM);Convolution Neural Network (CNN);Long Short-Term Memory (LSTM)

Get Citation

余海波,陈金广.基于卷积块注意力模块的图像描述生成模型.计算机系统应用,2021,30(8):194-200

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 24,2020
Revised:December 22,2020
Adopted:
Online: August 03,2021
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063