基于多层级上下文投票的三维密集字幕

doi:10.15888/j.cnki.csa.008997

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月27日 23:25 星期日

首页 > 过刊浏览>2023年第32卷第3期 >291-299. DOI:10.15888/j.cnki.csa.008997

PDF HTML阅读 XML下载导出引用引用提醒

基于多层级上下文投票的三维密集字幕
DOI:
                        10.15888/j.cnki.csa.008997
                    
CSTR:
                        
                    
作者:
                        吴春雷吴春雷
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
郝宇钦郝宇钦
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
李阳李阳
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山东省自然科学基金(ZR2020MF136)

3D Dense Captioning Method Based on Multi-level Context Voting

Author:

WU Chun-Lei
WU Chun-Lei
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
HAO Yu-Qin
HAO Yu-Qin
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
LI Yang
LI Yang
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

传统的三维密集字幕方法存在未充分考虑上下文信息、点云特征信息丢失以及隐藏状态信息量单一等问题. 为了应对这些挑战, 提出了多层级上下文投票网络, 该网络在投票过程中使用自注意力机制捕获点云的上下文信息并加以多层级利用, 提升检测对象的准确率. 同时, 还设计了隐藏状态-注意力时序融合模块, 将当前时刻隐藏状态融合与前一时刻注意力结果融合, 丰富隐藏状态信息量, 从而提高模型表达能力. 除此之外, 采用“两阶段”训练方法, 有效过滤掉生成的低质量对象提案, 增强描述效果. 在官方数据集ScanNet和ScanRefer上的大量实验表明, 该方法与基线方法相比取得了更有竞争力的结果.

关键词:三维密集字幕;注意力机制;上下文投票;隐藏状态-注意力时序融合;两阶段训练方法

Abstract:

Traditional three-dimensional (3D) dense captioning methods have problems such as insufficient consideration of point-cloud context information, loss of feature information, and thin hidden state information. Therefore, a multi-level context voting network is proposed. It uses the self-attention mechanism to capture the context information of point clouds in the voting process and utilizes it at multiple levels to improve the accuracy of object detection. Meanwhile, the temporal fusion of hidden state and attention module is designed to fuse the hidden state of the current moment with the attention result of the previous moment to enrich the information of the hidden state and thus improve the expressiveness of the model. In addition, a “two-stage” training method is adopted in the model, which can effectively filter out the generated low-quality object proposals and enhance the description effect. Extensive experiments on official datasets ScanNet and ScanRefer show that this method achieves more competitive results compared to baseline methods.

Key words:3D dense captioning;attention mechanism;context voting;temporal fusion of hidden state and attention;two-stage training method

引用本文

吴春雷,郝宇钦,李阳.基于多层级上下文投票的三维密集字幕.计算机系统应用,2023,32(3):291-299

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-08-03
最后修改日期:2022-09-07
录用日期:
在线发布日期: 2022-12-09
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码