Abstract:In the extraction of roads from high-resolution remote sensing images, problems such as local disconnections and the loss of details are common due to the complex backgrounds and the presence of trees and buildings covering the roads during the image formation process. To solve these problems, this study proposes a road extraction model called MSDANet, based on a multi-scale difference aggregation mechanism. The model has an encoder-decoder structure, using the Res2Net module as the backbone network of the encoder to obtain information with fine-grained and multi-scale features from the images and to expand the receptive field for feature extraction. Additionally, a gated axial guidance module, in conjunction with road morphological features, is applied to highlight the representation of road features and improve the connectivity of long-distance roads in road extraction. Furthermore, a multi-scale difference aggregation module is used between the encoder and decoder to extract and aggregate the different information between shallow and deep features. The aggregated features are then fused with the decoded features through a feature fusion module to facilitate the decoder to accurately restore road features. The proposed method has been evaluated on two high-resolution remote sensing datasets: DeepGlobe and CHN6-CUG. The results show that the F1 score of the MSDANet model is 80.37% and 78.17% respectively, and the IoU is 67.18% and 64.17% respectively. It indicates that the proposed model outperforms the comparison models.