###

计算机系统应用英文版:2024,33(11):1-14

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于多模态数据融合的飞行员注视区域分类

段高乐¹, 王长元¹, 吴恭朴², 王红艳¹

(1.西安工业大学计算机科学与工程学院, 西安 710021;2.西安工业大学光电工程学院, 西安 710021)

Pilot’s Gaze Zone Classification Based on Multi-modal Data Fusion

DUAN Gao-Le¹, WANG Chang-Yuan¹, WU Gong-Pu², WANG Hong-Yan¹

(1.School of Computer Science and Engineering, Xi’an Technology University, Xi’an 710021, China;2.School of Optoelectronic Engineering, Xi’an Technology University, Xi’an 710021, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 33次下载 511次
Received:April 21, 2024 Revised:May 20, 2024

中文摘要: 为了解决图像采集过程中眼图消失和头部姿态估计不准确的问题, 利用基于非接触式的眼部信息获取方法采集人脸图像, 从单个图像帧中确定飞行员当前的注视方向. 同时, 针对现有网络忽略头部运动对视线造成遮挡所导致的分类效果不佳问题, 结合人脸图像与头部姿态特征, 通过改进的MobileViT模型提出一种用于飞行员注视区域分类的多模态数据融合网络. 首先提出了多模态数据融合模块解决特征拼接过程中尺寸不平衡导致的过拟合问题, 其次提出一种基于并行分支SE机制的逆残差块, 充分利用网络浅层的空间和通道特征信息, 并结合Transformer的全局注意力机制捕捉多尺度特征. 最后, 重新设计了Mobile Block结构, 使用深度可分离卷积降低模型复杂度. 利用自制数据集FlyGaze对新模型和主流基线模型进行对比, 实验结果表明, PilotT模型对注视区域0、3、4、5的分类准确率均在92%以上, 且对人脸发生偏转的情况具有较强适应力. 研究结果对提升飞行训练质量以及飞行员意图识别和疲劳评估具有实际应用价值.

中文关键词: 注视区域分类并行分支SE机制 MobileViT 多模态数据融合

Abstract:To avoid eye image disappearance and inaccurate head pose estimation during image capture, a non-contact method for acquiring eye information is employed to collect facial images, determining the pilot’s current gaze direction from a single image frame. Concurrently, considering the poor classification of current networks due to the neglect of visual obstruction caused by head movements, with a combination of facial images and head poses, a multimodal data fusion network for the pilot’s gaze region classification is proposed using an improved MobileVit model. Firstly, a multi-modal data fusion module is introduced to address the problem of overfitting resulting from size imbalances during feature concatenation. Additionally, an inverse residual block based on a parallel branch SE mechanism is proposed to fully leverage spatial and channel feature information in the shallow layers of the network. Moreover, multi-scale features are captured by integrating the global attention mechanism from the Transformer. Finally, the Mobile Block structure is redesigned and the depthwise separable convolution is utilized to reduce model complexity. Experimental comparisons with mainstream baseline models are conducted using a self-made dataset FlyGaze. The results demonstrate that the PilotT model achieves classification accuracies exceeding 92% for gaze regions 0, 3, 4, and 5, with robust adaptability to facial deflection. These findings hold practical significance for enhancing flight training quality and facilitating pilot intention recognition and fatigue assessment.

keywords: gaze region classification parallel branch SE mechanism MobileViT multi-modal data fusion

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金(52072293)

引用文本：
段高乐,王长元,吴恭朴,王红艳.基于多模态数据融合的飞行员注视区域分类.计算机系统应用,2024,33(11):1-14
DUAN Gao-Le,WANG Chang-Yuan,WU Gong-Pu,WANG Hong-Yan.Pilot’s Gaze Zone Classification Based on Multi-modal Data Fusion.COMPUTER SYSTEMS APPLICATIONS,2024,33(11):1-14

Author Name	Affiliation	E-mail
DUAN Gao-Le	School of Computer Science and Engineering, Xi’an Technology University, Xi’an 710021, China
WANG Chang-Yuan	School of Computer Science and Engineering, Xi’an Technology University, Xi’an 710021, China	cyw901@163.com
WU Gong-Pu	School of Optoelectronic Engineering, Xi’an Technology University, Xi’an 710021, China
WANG Hong-Yan	School of Computer Science and Engineering, Xi’an Technology University, Xi’an 710021, China