Abstract:The loss of coding information, the poor adaptability to multi-scale building targets, and the insufficient contextual feature connection can be found in the classic Unet algorithm during the extraction of building features from remote sensing images. To tackle these problems, this study proposes a deformed-residual-pyramid codec network with multi-scale fusion. First, the original coding structure is replaced by the deep coding network and the down-sampling bypass network, which jointly extract the high-level feature information of the building target. Second, the residual pyramid structure combined with deformed convolution is introduced at the penultimate node of the coding network to improve the network’s ability to recognize multi-scale features and edge fuzzy features of buildings. Finally, the high- and low-level features are cascaded and merged layer by layer, and the segmentation result of the building is obtained at the end of the decoding network. The experimental results show that compared with the original model, the improved model has increased F1-score and MIoU by 1.6% and 2.1%, respectively.