本文已被:浏览 4次 下载 24次
Received:May 24, 2024 Revised:June 17, 2024
Received:May 24, 2024 Revised:June 17, 2024
中文摘要: 群体行为识别是计算机视觉领域中备受关注的研究方向之一, 旨在通过多个个体动作与互动关系确定整体的行为. 然而, 由于确定个体互动关系、联系紧密程度以及活动关键人物三者的困难, 现有方法常关注于人物的个体特征, 忽略了与活动场景上下文的相互联系. 针对该问题, 提出一个基于全局-个体特征融合的群体行为识别推理模型GIFFNet (global-individual feature fusion network). 通过构建全局-个体特征融合(GIFF)模块, GIFFNet在聚焦关键信息的基础上, 有效整合了场景上下文与个体人物特征, 获取了更具表征能力的融合特征, 以弥补预测群体行为时场景信息缺失的问题. 随后, GIFFNet利用融合特征计算场景中人物之间的交互关系图, 并使用图卷积网络(GCN)进行训练和群体行为类别预测. 此外, 为解决数据集样本失衡的问题, GIFFNet采用动态分配权重的策略优化损失函数. 实验结果表明, GIFFNet在Volleyball、Collective Activity数据集上的多类分类准确度分别为93.8%、96.1%, 类平均精确度分别为93.9%、95.8%, 优于其他现有的深度学习方法. GIFFNet通过特征融合为行为分类提供了表征能力更加强大的特征, 有效地提升了行为识别的精确度.
Abstract:Group activity recognition (GAR) is one of the highly researched areas in the field of computer vision, aiming to detect the overall behavior performed by multiple individual actions and interactions. However, due to difficulties in determining individual interaction relationships, the tightness of connections, and the key actor, current methods often focus on individual character features, yet neglecting connections with scene context. To address that issue, a novel reasoning model for GAR, GIFFNet, is proposed based on global-individual feature fusion (GIFF). To compensate for the lack of scene information in predicting group activity, GIFFNet, on the basis of focusing on key information, effectively integrates scene context and individual character features by constructing the GIFF module, obtaining more representative fusion features. Subsequently, GIFFNet utilizes fusion features to calculate the interaction relationship graph between characters in the scene and uses graph convolutional network (GCN) for training and predicting group behavior categories. In addition, to address the issue of imbalanced samples in the dataset, GIFFNet adopts a strategy of dynamically assigning weights to optimize the loss function. Experimental results demonstrate that GIFFNet achieves a multi-class classification accuracy (MCA) of 93.8% and 96.1% on Volleyball and Collective Activity datasets, and the mean per class accuracy (MPCA) is 93.9% and 95.8%, respectively, outperforming other existing deep learning methods. GIFFNet provides features with a more powerful characterization ability for activity classification through feature fusion, which effectively improves GAR accuracy.
keywords: group activity recognition scene context feature fusion attention mechanism dynamic loss function
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(41975183, 41875184)
引用文本:
程勇,程遥,王军,杨玲,许小龙,高园元,张开华.基于全局-个体特征融合的群体行为识别.计算机系统应用,,():1-12
CHENG Yong,CHENG Yao,WANG Jun,YANG Ling,XU Xiao-Long,GAO Yuan-Yuan,ZHANG Kai-Hua.Group Activity Recognition Based on Global-individual Feature Fusion.COMPUTER SYSTEMS APPLICATIONS,,():1-12
程勇,程遥,王军,杨玲,许小龙,高园元,张开华.基于全局-个体特征融合的群体行为识别.计算机系统应用,,():1-12
CHENG Yong,CHENG Yao,WANG Jun,YANG Ling,XU Xiao-Long,GAO Yuan-Yuan,ZHANG Kai-Hua.Group Activity Recognition Based on Global-individual Feature Fusion.COMPUTER SYSTEMS APPLICATIONS,,():1-12