Abstract:Urban remote sensing images pose challenges in boundary segmentation due to their high resolution, diverse backgrounds, and intricate textures. Mainstream semantic segmentation models encounter difficulties, including edge blurring, smooth corners, and the inability to capture long-range dependencies.To address these challenges, ARD-UNet++, an enhanced model based on UNet++, is introduced. A 7×7 depthwise separable convolution is employed to reduce the parameter count, facilitating denser feature extraction and comprehensive contextual information capture. The SimAM non-parametric attention mechanism is introduced to selectively focus on crucial features without introducing additional parameters, effectively suppressing irrelevant information. Residual connections are integrated to prevent local optima, with the Res-SimAMmodule replacing the standard convolution block in upsampling nodes. In comparison to UNet++, the proposedenhanced model demonstrates significant improvements in UAVid and Potsdam datasets, achieving a 6.77% and 1.79% increase in mIoU, 4.71% and 1.17% in F1, and 4.99% and 0.98% in OA, respectively. A comparative analysis against recent mainstream models underscores its superior performance, positioning ARD-UNet++ as a promising solution for precise urban remote sensing image segmentation.