Abstract:Road damage poses a great threat to the service life and safety of roads. Early detection of road defects facilitates maintenance and repair. Traditional road defect detection methods typically rely on manual visual inspection and vehicle-mounted pavement monitoring systems. However, these methods are largely influenced by the experience of road maintenance personnel. With the advancement of deep learning, increasing numbers of researchers have applied it to road defect detection. Among these, the YOLO series of object detection methods and their various variants are the most common. However, most of these methods require post-processing operations, which hinder model optimization, impair robustness, and lead to delayed inference by the detector. To address these issues, as well as the multi-scale challenges in road defect detection, an improved RT-DETR model is proposed. The backbone network is fine-tuned, and the MSaE attention module is introduced. In the encoder, GhostConv convolution and DySample module are used to optimize upsampling, while the ADown module optimizes downsampling. Comparative experiments are conducted on the public SVRDD dataset. Experimental results show that the proposed improved method achieves a 72.5% mAP@50 on the SVRDD dataset, 3.8 percentage points higher than the benchmark RT-DETR-R18, significantly enhancing road defect detection performance.