Abstract:Crowd behavior recognition has important application value in public safety and other fields. Existing studies have considered the influence of such factors on crowd behavior as crowd emotions, crowd types, crowd density, and social and cultural backgrounds of crowds separately, but few models comprehensively consider these factors, which limits model performance. This study comprehensively considers the correlation between the physical features, social features, emotional and personality features, and cultural background features of the crowd and the influence of the combination of these factors on crowd behavior. As a result, a crowd behavior recognition model that integrates multiple features and time series is proposed. The model uses two parallel network layers to deal with the influence of multi-feature correlation and time-series dependence on crowd behavior separately. Meanwhile, the network layer fuses the structural causal model (SCM) and the causal graph network (CGN) based on the graph neural network (GNN) to improve the interpretability of the model. The experiments on the motion and emotion dataset (MED) and the comparison with other state-of-the-art models demonstrate that the proposed method can successfully identify crowd behavior and outperform the state-of-the-art methods.