Abstract:Group activity recognition (GAR) is one of the highly researched areas in the field of computer vision, aiming to detect the overall behavior performed by multiple individual actions and interactions. However, due to difficulties in determining individual interaction relationships, the tightness of connections, and the key actor, current methods often focus on individual character features, yet neglecting connections with scene context. To address that issue, a novel reasoning model for GAR, GIFFNet, is proposed based on global-individual feature fusion (GIFF). To compensate for the lack of scene information in predicting group activity, GIFFNet, on the basis of focusing on key information, effectively integrates scene context and individual character features by constructing the GIFF module, obtaining more representative fusion features. Subsequently, GIFFNet utilizes fusion features to calculate the interaction relationship graph between characters in the scene and uses graph convolutional network (GCN) for training and predicting group behavior categories. In addition, to address the issue of imbalanced samples in the dataset, GIFFNet adopts a strategy of dynamically assigning weights to optimize the loss function. Experimental results demonstrate that GIFFNet achieves a multi-class classification accuracy (MCA) of 93.8% and 96.1% on Volleyball and Collective Activity datasets, and the mean per class accuracy (MPCA) is 93.9% and 95.8%, respectively, outperforming other existing deep learning methods. GIFFNet provides features with a more powerful characterization ability for activity classification through feature fusion, which effectively improves GAR accuracy.