基于卷积块注意力模块的图像描述生成模型

doi:10.15888/j.cnki.csa.008043

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月4日 17:32 星期五

首页 > 过刊浏览>2021年第30卷第8期 >194-200. DOI:10.15888/j.cnki.csa.008043

PDF HTML阅读 XML下载导出引用引用提醒

基于卷积块注意力模块的图像描述生成模型
DOI:
                        10.15888/j.cnki.csa.008043
                    
CSTR:
                        
                    
作者:
                        余海波余海波
西安工程大学 计算机科学学院, 西安 710600;河南省电子商务大数据处理与分析重点实验室, 洛阳 471934
在期刊界中查找
在百度中查找
在本站中查找
陈金广陈金广
西安工程大学 计算机科学学院, 西安 710600;河南省电子商务大数据处理与分析重点实验室, 洛阳 471934
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:河南省电子商务大数据处理与分析重点实验室开放课题（2020-KF-7）；陕西省教育厅科研计划（21JP049）

Image Caption Generation Model Based on Convolutional Block Attention Module

Author:

YU Hai-Bo
YU Hai-Bo
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China;Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Jin-Guang
CHEN Jin-Guang
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China;Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

图像描述生成模型是使用自然语言描述图片的内容及其属性之间关系的算法模型. 对现有模型描述质量不高、图片重要部分特征提取不足和模型过于复杂的问题进行了研究, 提出了一种基于卷积块注意力机制模块(CBAM)的图像描述生成模型. 该模型采用编码器-解码器结构, 在特征提取网络Inception-v4中加入CBAM, 并作为编码器提取图片的重要特征信息, 将其送入解码器长短期记忆网络(LSTM)中, 生成对应图片的描述语句. 采用MSCOCO2014数据集中训练集和验证集进行训练和测试, 使用多个评价准则评估模型的准确性. 实验结果表明, 改进后模型的评价准则得分优于其他模型, 其中Model2实验能够更好地提取到图像特征, 生成更加准确的描述.

关键词:图像描述生成;卷积块注意力模块;卷积神经网络;长短期记忆网络

Abstract:

The image caption generation model uses natural language to describe the content of images and the relationship between attributes. In the existing models, there are problems of low description quality, insufficient feature extraction of important parts of images, and high complexity. Therefore, this study proposes an image caption generation model based on a Convolutional Block Attention Module (CBAM), which has an encoder-decoder structure. CBAM is added into the feature extraction network Inception-v4 and as an encoder, extracts the important feature information of the images. The information is then sent into the Long Short-Term Memory (LSTM) of the decoder to generate the caption of the corresponding pictures. The MSCOCO2014 data set is applied to training and testing, and multiple evaluation criteria are used to evaluate the accuracy of the model. The experimental results show that the improved model has a higher evaluation criterion score than other models, and Model2 can better extract image features and generate a more accurate description.

Key words:image caption generation;Convolutional Block Attention Module (CBAM);Convolution Neural Network (CNN);Long Short-Term Memory (LSTM)

引用本文

余海波,陈金广.基于卷积块注意力模块的图像描述生成模型.计算机系统应用,2021,30(8):194-200

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-11-24
最后修改日期:2020-12-22
录用日期:
在线发布日期: 2021-08-03
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码