基于自适应Token池化与集合预测增强的目标检测

doi:10.15888/j.cnki.csa.009765

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月9日 21:53 星期三

首页 > 过刊浏览>2025年第34卷第2期 >74-83. DOI:10.15888/j.cnki.csa.009765

PDF HTML阅读 XML下载导出引用引用提醒

基于自适应Token池化与集合预测增强的目标检测
DOI:
                        10.15888/j.cnki.csa.009765
                    
CSTR:
                        
                    
作者:
                        刘耀刘耀
武汉科技大学 计算机科学与技术学院, 武汉 430081
在期刊界中查找
在百度中查找
在本站中查找
陈东方陈东方
武汉科技大学 计算机科学与技术学院, 武汉 430081
在期刊界中查找
在百度中查找
在本站中查找
王晓峰王晓峰
武汉科技大学 计算机科学与技术学院, 武汉 430081
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:湖北省教育厅科学研究计划重点项目(D20211106)

Object Detection Based on Adaptive Token Pooling and Enhanced Set Prediction

Author:

LIU Yao
LIU Yao
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Dong-Fang
CHEN Dong-Fang
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Xiao-Feng
WANG Xiao-Feng
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于Transformer的目标检测算法往往存在着精度不足, 收敛速度慢的问题. 许多研究针对这些问题进行改进, 取得了一定的成果. 但是这些研究大都忽视了Transformer结构应用于目标检测领域时存在的两个不足之处. 首先, 自注意力运算结果缺乏多样性. 其次, 因集合预测难度大, 使得模型在匹配目标的过程中表现不稳定. 为了弥补上述缺陷, 首先设计了自适应token池化模块, 增加自注意力权重的多样性. 其次, 设计了一种基于粗预测的锚框定位模块, 并利用该模块为查询提供位置先验信息, 从而提高二分图匹配过程的稳定性. 最后, 设计了基于组的去噪任务, 通过训练模型对位于目标附近的正负查询进行区分, 从而提高模型进行集合预测的能力. 实验结果表明, 本文提出的改进算法在COCO数据集上取得了较好的训练结果. 与基线模型相比, 改进算法在检测精度与收敛速度上有较大提升.

关键词:目标检测;query初始化方式;自注意力;训练策略

Abstract:

Transformer-based object detection algorithms often suffer from problems such as insufficient accuracy and slow convergence. Although many studies have proposed improvements to address these problems and have achieved certain outcomes, most of them overlook two key shortcomings when applying Transformer structure to the field of object detection. Firstly, self-attention computation results are not diversified. Secondly, due to the complexity of set prediction, the models are unstable during target matching. To overcome these deficiencies, this study proposes several enhancements. Firstly, an adaptive token pooling module is designed to increase self-attention weight diversity. Secondly, a rough-prediction-based anchor box localization module is introduced, which provides positional prior information for queries to enhance stability during bipartite matching. Lastly, a group-based denoising task is designed, which trains the model to distinguish between positive and negative queries near the target, thereby improving the model’s ability to perform set prediction. Experimental results show that the proposed improved algorithm achieves better training results on the COCO dataset. Compared with the baseline model, the improved algorithm significantly outperforms in both detection accuracy and convergence speed.

Key words:object detection;query initialization mode;self-attention;training strategy

引用本文

刘耀,陈东方,王晓峰.基于自适应Token池化与集合预测增强的目标检测.计算机系统应用,2025,34(2):74-83

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-07-17
最后修改日期:2024-08-13
录用日期:
在线发布日期: 2024-12-16
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码