Abstract:The crowd density detection algorithm based on deep learning has made great progress, while there is still a lot of room for improvement in the detection accuracy and robustness of the algorithm in actual complex scenes. Factors such as inconsistent object scales and background information interference in complex scenes make crowd density detection a challenging task. Aiming at this problem, this study proposes a crowd density detection network based on multi-scale feature fusion. The network first uses images of different resolutions to interactively extract coarse and fine-grained features of the crowd and introduces a multi-level feature fusion mechanism to make full use of multi-level scale information. Secondly, the study utilizes the spatial and channel attention mechanism to highlight the weight of crowd characteristics, focus on interested crowds, reduce background information interference, and generate high-quality density maps. Experimental results show that the crowd density detection network with multi-scale feature fusion has better accuracy and robustness than representative crowd density detection methods on multiple typical public datasets.