Abstract:The accuracy of crowd density estimation is low in complex backgrounds and the scenario with dense and mutually occluded crowds. To solve this, we propose a method based on YOLOv3 enhanced model fusion to estimate crowd density. The heads and bodies in the data set are labeled to generate head and body sets, which can then help train the two YOLOv3 enhanced models: YOLO-body and YOLO-head. Finally, the two models are reasoned on the same test data set, and their outputs are fused to the maximum value. Consequently, the method based on YOLOv3 enhanced model fusion has great robustness because its accuracy is 4% higher than that of original target detection and density map regression.