Abstract:Using embedded devices to detect pedestrians at the edge can meet the basic needs of real time, security and privacy protection. The original CenterNet backbone network model usually adopts Deep Layer Aggregation (DLA), Hourglass, etc. with high complexity for multi-level features fusion, which limits the computing power of embedded devices and thereby makes the real-time detection difficult. In view of this, BiFPN and weighted feature fusion are employed for the weighted fusion of feature layers in the backbone, by which the original backbone method is improved. This strategy enhances the detection speed while ensuring the detection accuracy. Further, the Gauss kernel distribution on the HeatMap during training was modified so that the adaptability to pedestrian detection can be increased. As a result, the accuracy reduction caused by missing detection due to pedestrian occlusion is lowered. The results of the experiment on Jetson TX2 show that the Average Precision (AP) of pedestrian detection with the improved method is 0.774, and the inference time of a single image is 68 ms, which can meet the requirements of embedded devices for real-time detection.