Abstract:Cloud and cloud shadow segmentation is a key task in remote sensing image processing, where traditional deep learning methods often encounter problems such as missed detection, error detection, and loss of detail. To address these challenges, this study proposes a dual-branch architecture combining ResNet34 and MobileNetV3. First, MobileNetV3 is used as the secondary residual branch for preliminary feature extraction, aiming to reduce computational burden and parameter count when processing simple features. The preliminary features are then passed into the primary residual branch, ResNet34, for deep feature extraction. To avoid the information loss caused by max pooling operations, a multi-scale strip convolutional pooling module (MS-SCPM) is designed, which extracts features through various pooling and strip convolution methods to preserve important details. To fuse multi-scale information and effectively detect small targets, an attention-based dynamic pyramid multi-scale feature extraction module (ADPMFEM) is introduced, which flexibly captures key features while suppressing redundant information. The decoder uses a content-aware reassembly of features with attention (CWA) module, which optimizes the upsampling process through feature map weighting to improve edge recovery and enhance segmentation accuracy. Finally, deformable convolutions are introduced before pixel classification to further optimize the segmentation results. Experimental results show that the proposed model performs excellently on the Biome 8, HRC-WHU, and SPARCS datasets, with the mean intersection over union (MIoU) reaching 79.19%, 90.41%, and 77.89%, respectively, outperforming existing methods. This achievement can be applied to image analysis of clouds and cloud shadows in remote sensing domains, including environmental monitoring, disaster assessment, and agricultural surveillance, improving data processing accuracy and efficiency