基于特征强化与知识补充的视频描述方法

doi:10.15888/j.cnki.csa.009100

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 3:28 星期四

首页 > 过刊浏览>2023年第32卷第5期 >273-282. DOI:10.15888/j.cnki.csa.009100

PDF HTML阅读 XML下载导出引用引用提醒

基于特征强化与知识补充的视频描述方法
DOI:
                        10.15888/j.cnki.csa.009100
                    
CSTR:
                        
                    
作者:
                        王林王林
西安理工大学 自动化与信息工程学院, 西安 710048
在期刊界中查找
在百度中查找
在本站中查找
白云帆白云帆
西安理工大学 自动化与信息工程学院, 西安 710048
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:陕西省科技计划重点项目（2017ZDCXL-GY-05-03）

Video Description Method Combining Feature Reinforcement and Knowledge Supplementation

Author:

WANG Lin
WANG Lin
School of Automation and Information Engineering, Xian University of Technology, Xian 710048, China
在期刊界中查找
在百度中查找
在本站中查找
BAI Yun-Fan
BAI Yun-Fan
School of Automation and Information Engineering, Xian University of Technology, Xian 710048, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对视频描述生成的文本质量不高与不够新颖的问题, 本文提出一种基于特征强化与文本知识补充的编解码模型. 在编码阶段, 该模型通过局部与全局特征强化增强模型对视频中静态物体的细粒度特征提取, 提高了对物体相似语义的分辨, 并融合视觉语义与视频特征于长短期记忆网络(long short-term memory, LSTM); 在解码阶段, 为挖掘视频中不易被机器发现的隐含信息, 截取视频部分帧并检测其中视觉目标, 利用得到的视觉目标从外部知识语库提取知识用来补充描述文本的生成, 以此产生出更新颖更自然的文本描述. 在MSVD与MSR-VTT数据集上的实验结果表明, 本文方法展现出良好的性能, 并且生成的内容信息在一定程度上能够表现出新颖的隐含信息.

关键词:视频描述;编解码模型;特征强化;视觉目标;知识补充;人工智能;自然语言处理

Abstract:

As texts generated by video descriptions are of low quality and not novel, this study proposes a codec model based on feature reinforcement and text knowledge supplementation. In the coding stage, the model enhances the fine-grained feature extraction of static objects in a video by strengthening local and global features, thus improving the resolution of similar semantics of objects. Then, it integrates visual semantics and video features into a long short-term memory (LSTM) network. In the decoding stage, to mine the hidden information that can hardly be discovered by machines in the video, the model intercepts partial video frames and detects the visual goals in them. Then, the obtained visual goals are used to extract knowledge from the external knowledge base to supplement the generation of descriptive texts and thus produce more novel and natural text descriptions. The experimental results on datasets MSVD and MSR-VTT demonstrate that the proposed method shows good performance, and the generated content can show novel implicit information to a certain extent.

Key words:video description;codec model;feature reinforcement;visual goals;knowledge supplementation;artificial intelligence;natural language processing (NLP)

引用本文

王林,白云帆.基于特征强化与知识补充的视频描述方法.计算机系统应用,2023,32(5):273-282

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-11-07
最后修改日期:2022-12-10
录用日期:
在线发布日期: 2023-03-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码