融合双注意力与多标签的图像中文描述生成方法

doi:10.15888/j.cnki.csa.008010

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月14日 12:26 星期一

首页 > 过刊浏览>2021年第30卷第7期 >32-40. DOI:10.15888/j.cnki.csa.008010

PDF HTML阅读 XML下载导出引用引用提醒

融合双注意力与多标签的图像中文描述生成方法
DOI:
                        10.15888/j.cnki.csa.008010
                    
CSTR:
                        
                    
作者:
                        田枫田枫
东北石油大学 计算机与信息技术学院, 大庆 163318
在期刊界中查找
在百度中查找
在本站中查找
孙小强孙小强
东北石油大学 计算机与信息技术学院, 大庆 163318
在期刊界中查找
在百度中查找
在本站中查找
刘芳刘芳
东北石油大学 计算机与信息技术学院, 大庆 163318
在期刊界中查找
在百度中查找
在本站中查找
李婷玉李婷玉
中国石油天然气股份有限公司 冀东油田分公司 信息中心, 唐山 063004
在期刊界中查找
在百度中查找
在本站中查找
张蕾张蕾
中国石油天然气股份有限公司 冀东油田分公司 信息中心, 唐山 063004
在期刊界中查找
在百度中查找
在本站中查找
刘志刚刘志刚
东北石油大学 计算机与信息技术学院, 大庆 163318
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:黑龙江省自然科学基金（LH2020F003）；国家自然科学基金（61502094）；黑龙江省省属本科高校基本科研业务费项目（KYCXTD201903）；中央支持地方高校改革发展资金人才培养支持计划（140119001）；东北石油大学研究生教育创新工程（JYCX_11_2020）；东北石油大学引导性创新基金（2020YDL-11）

Chinese Image Caption with Dual Attention and Multi-Label Image

Author:

TIAN Feng
TIAN Feng
School of Computer & Information Technology, Northeast Petroleum University, Daqing 163318, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Xiao-Qiang
SUN Xiao-Qiang
School of Computer & Information Technology, Northeast Petroleum University, Daqing 163318, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Fang
LIU Fang
School of Computer & Information Technology, Northeast Petroleum University, Daqing 163318, China
在期刊界中查找
在百度中查找
在本站中查找
LI Ting-Yu
LI Ting-Yu
Information Center, Jidong Oilfield Branch, PetroChina Co. Ltd., Tangshan 063004, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Lei
ZHANG Lei
Information Center, Jidong Oilfield Branch, PetroChina Co. Ltd., Tangshan 063004, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Zhi-Gang
LIU Zhi-Gang
School of Computer & Information Technology, Northeast Petroleum University, Daqing 163318, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [24]

相似文献

引证文献

资源附件

文章评论

摘要:

图像描述是目前图像理解领域的研究热点. 针对图像中文描述句子质量不高的问题, 本文提出融合双注意力与多标签的图像中文描述生成方法. 本文方法首先提取输入图像的视觉特征与多标签文本, 然后利用多标签文本增强解码器的隐藏状态与视觉特征的关联度, 根据解码器的隐藏状态对视觉特征分配注意力权重, 并将加权后的视觉特征解码为词语, 最后将词语按时序输出得到中文描述句子. 在图像中文描述数据集Flickr8k-CN、COCO-CN上的实验表明, 本文提出的模型有效地提升了描述句子质量.

关键词:图像描述;图像理解;图像中文描述;注意力机制;图像多标签

Abstract:

Image caption represents a research hotspot in the field of image understanding. In view of the poor quality of sentences, we propose Chinese image caption combining dual attention and multi-label images. We extract visual features and multi-label text firstly, and then use multi-label text to enhance the correlation between the hidden state of the decoder and visual features. Next, we redistribute attention weights to the visual features according to the hidden state of the decoder and decode the weighted features into words. Finally, the words are output in a time sequence to obtain Chinese sentences. Experiments on Chinese image caption datasets, Flickr8k-CN and COCO-CN, reveal that the proposed method substantially improves the quality of sentences.

Key words:image caption;image understanding;Chinese image caption;attention mechanism;multi-label image

参考文献

[1] 王晓光, 徐雷, 李纲. 敦煌壁画数字图像语义描述方法研究. 中国图书馆学报, 2014, 40(1):50-59.[doi:10.3969/j.issn.1001-8867.2014.01.005

[2] Farhadi A, Hejrati M, Sadeghi MA, et al. Every picture tells a story:Generating sentences from images. Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece. 2010. 15-29.

[3] Fang H, Gupta S, Iandola F, et al. From captions to visual concepts and back. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 1473-1482.

[4] Ordonez V, Kulkarni G, Berg TL, et al. Im2Text:Describing images using 1 million captioned photographs. Proceedings of Advances in Neural Information Processing Systems. Granada, Spain. 2011. 1143-1151.

[5] Vinyals O, Toshev A, Bengio S, et al. Show and tell:A neural image caption generator. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 3156-3164.

[6] Jia X, Gavves E, Fernando B, et al. Guiding the long-short term memory model for image caption generation. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 2407-2415.

[7] 汤鹏杰, 谭云兰, 李金忠. 融合图像场景及物体先验知识的图像描述生成模型. 中国图象图形学报, 2017, 22(9):1251-1260.[doi:10.11834/jig.170052

[8] Xu K, Ba JL, Kiros R, et al. Show, attend and tell:Neural image caption generation with visual attention. Proceedings of the 32nd International Conference on Machine Learning. Lille, France. 2015. 2048-2057.

[9] Li LH, Tang S, Deng LX, et al. Image caption with global-local attention. Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, CA, USA. 2017. 4133-4139.

[10] Huang L, Wang WM, Xia YX, et al. Adaptively aligned image captioning via adaptive attention time. Proceedings of Advances in Neural Information Processing Systems. Vancouver, BC, Canada. 2019. 8942-8951.

[11] Li XR, Lan WY, Dong JF, et al. Adding Chinese captions to images. Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. New York, NY, USA. 2016. 271-275.

[12] 张凯, 李军辉, 周国栋. 基于枢轴语言的图像描述生成研究. 中文信息学报, 2019, 33(3):110-117

[13] 蓝玮毓, 王晓旭, 杨刚, 等. 标签增强的中文看图造句. 计算机学报, 2019, 42(1):136-148

[14] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770-778.

[15] Krizhevsky A, Sutskever I, Hinton GE, et al. ImageNet classification with deep convolutional neural networks. Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, CA, USA. 2012. 1097-1105.

[16] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.[doi:10.1162/neco.1997.9.8.1735

[17] Liu Y, Dai YT, Doan AD, et al. In defense of OSVOS. arXiv preprint arXiv:1908.06692, 2019.

[18] Li XR, Xu CX, Wang XX, et al. COCO-CN for cross-lingual image tagging, captioning, and retrieval. IEEE Transactions on Multimedia, 2019, 21(9):2347-2360.[doi:10.1109/TMM.2019.2896494

[19] Li ZG, Sun MS. Punctuation as implicit annotations for Chinese word segmentation. Computational Linguistics, 2009, 35(4):505-512.[doi:10.1162/coli.2009.35.4.35403

[20] Papineni K, Roukos S, Ward T, et al. BLEU:A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, PA, USA. 2002. 311-318.

[21] Banerjee S, Lavie A. METEOR:An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, MI, USA. 2005. 228-231.

[22] Lin CY. ROUGE:A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches Out. Barcelona, Spain. 2004. 74-81.

[23] Lu JS, Xiong CM, Parikh D, et al. Knowing when to look:Adaptive attention via a visual sentinel for image captioning. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 3242-3250.

[24] Chen L, Zhang HW, Xiao J, et al. SCA-CNN:Spatial and channel-wise attention in convolutional networks for image captioning. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 6298-6306.

引用本文

田枫,孙小强,刘芳,李婷玉,张蕾,刘志刚.融合双注意力与多标签的图像中文描述生成方法.计算机系统应用,2021,30(7):32-40

复制

文章指标

点击次数:1031
下载次数: 2383
HTML阅读次数: 1974
引用次数: 0

历史

收稿日期:2020-10-22
最后修改日期:2020-11-28
录用日期:
在线发布日期: 2021-07-02
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码