Abstract:Small object detection in remote and complex scenes faces persistent challenges, including low detection accuracy and poor robustness, due to the objects’ small size, irregular shape, weak texture, and high susceptibility to background interference. To address these issues, this study proposes an enhanced detection algorithm named remote-enhanced fusion YOLO (ReF-YOLO), which systematically optimizes the YOLO11 framework from three aspects: feature extraction, feature fusion, and detection head design. Specifically, a module called C3k2DCASC is introduced, integrating channel attention and spatial modeling, to enhance the backbone network’s representational capacity for irregular objects. The L-Fuse structure, combined with the same-scale features from the backbone and the efficient downsampling module SCDown, is introduced to improve semantic-detail alignment. Additionally, a high-resolution P2 detection branch is added to strengthen the perception and localization capacities of the algorithm for detecting extremely small objects. Experiments on a representative small object detection dataset VisDrone2019, demonstrate that the proposed method improves mAP@0.5 by 4.9% over YOLO11n, along with enhanced accuracy and stability across various small object detection tasks. These results validate the utility and generalization capability of ReF-YOLO in remote and complex scenes.