双向融合CNN与Transformer的三维视线估计

doi:10.15888/j.cnki.csa.009649

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月2日 17:22 星期三

首页 > 过刊浏览>2024年第33卷第10期 >66-74. DOI:10.15888/j.cnki.csa.009649

PDF HTML阅读 XML下载导出引用引用提醒

双向融合CNN与Transformer的三维视线估计
DOI:
                        10.15888/j.cnki.csa.009649
                    
CSTR:
                        32024.14.csa.009649
                    
作者:
                        吕嘉琦吕嘉琦
西安工业大学 计算机科学与工程学院, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找
王长元王长元
西安工业大学 计算机科学与工程学院, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(52072293)

3D Gaze Estimation by Bidirectional Fusion of CNN and Transformer

Author:

LYU Jia-Qi
LYU Jia-Qi
School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Chang-Yuan
WANG Chang-Yuan
School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对当前视线估计任务在无约束环境中易受影响因素干扰, 准确度不高的问题, 提出一种卷积与注意力双分支并行的特征交叉融合视线估计方法, 提升了特征融合的有效性和网络性能. 首先, 对Mobile-Former网络进行改进, 引入了线性注意力机制和部分卷积, 有效提高了特征提取能力并且降低了计算成本; 其次, 增加了基于300W-LP数据集预训练的ResNet50头部姿态特征估计网络分支来增强视线估计的准确度, 并使用Sigmoid函数作为门控单元来筛选有效特征; 最后, 将面部图像输入神经网络进行特征提取和融合, 输出三维视线估计方向. 在MPIIFace-Gaze和Gaze360数据集上评估模型, 该方法的视线平均角度误差为3.70°和10.82°, 通过与其他主流三维视线估计方法比较, 验证了该网络模型能够比较准确的估计三维视线方向并降低计算复杂度.

关键词:三维视线估计;并行结构;双向融合;部分卷积;线性注意力机制

Abstract:

To address the issue of low accuracy and susceptibility to interference from external factors in unconstrained environments, a convolution and attention double-branch parallel feature cross-fusion gaze estimation method is proposed to enhance feature fusion effectiveness and network performance. Firstly, the Mobile-Former network is enhanced by introducing a linear attention mechanism and partial convolution. This effectively improves the feature extraction capability while reducing computing costs. Additionally, a branch of the ResNet50 head pose feature estimation network, pre-trained on the 300W-LP dataset, is added to enhance gaze estimation accuracy. A Sigmoid function is used as a gating unit to screen effective features. Finally, facial images are inputted into the neural network for feature extraction and fusion, and the 3D gaze estimation direction is outputted. The model is evaluated on the MPIIFaceGaze and Gaze360 datasets, and the average angle error of the proposed method is 3.70° and 10.82°, respectively. The network model is verified to accurately estimate the 3D gaze direction and reduce computational complexity compared to other mainstream 3D gaze estimation methods.

Key words:3D gaze estimation;parallel structure;bidirectional fusion;partial convolution;linear attention mechanism

引用本文

吕嘉琦,王长元.双向融合CNN与Transformer的三维视线估计.计算机系统应用,2024,33(10):66-74

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-03-18
最后修改日期:2024-04-16
录用日期:
在线发布日期: 2024-08-21
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码