Abstract:Underwater target detection has practical significance in ocean exploration. This study proposes a FERT-DETR network suitable for underwater target detection to address the issues of complex underwater environments and limited target feature extraction due to occlusion and overlap. The proposed model first introduces a feature extraction module, Faster EMA, to replace the BasicBlock of ResNet18 in RT-DETR, which can significantly improve its capability to extract features of underwater targets while effectively reducing the number of parameters and depth of the model. Secondly, a cascaded group attention module, AIFI-CGA, is used in the encoding part to reduce computational redundancy in multi-head attention and improve attention diversity. Finally, a feature pyramid for high-level filtering named HS-FPN is used to replace CCFM, achieving multi-level fusion and improving the accuracy and robustness of detection. The experimental results show that the proposed algorithm, FERT-DETR, improves detection accuracy by 3.1% and 1.7% compared to RT-DETR on the URPC2020 and DUO datasets respectively, compresses the number of parameters by 14.7%, and reduces computational complexity by 9.2%. It can effectively avoid missed and false detection of targets of different sizes in complex underwater environments.