Abstract:Land cover classification of remote sensing images is crucial for urban planning, land use, environmental monitoring, and land cover temperature inversion. This study proposes a U-type Transformer network, U-BiFormer to address the issues of misclassification among similar land cover types and the imbalance of land cover classes in remote sensing images. Building upon BiFormer, this model employs a U-shaped decoder and uses the outputs of the decoders in all stages to predict the segmentation map, thereby enhancing the model’s ability to capture details and contextual information in images, allowing for better segmentation of similar classes. An improvement is made to the unique hybrid attention module of the U-shaped decoder, increasing the proportion of features from the current stage in the mixed features. This modification enables the decoder to focus more on refining the features at the current stage, enhancing the model’s segmentation performance for similar classes. Additionally, the CE+Focal hybrid loss function is employed to replace the conventional cross-entropy loss function to address the issue of class distribution imbalance in remote sensing images. Experiments demonstrate that the proposed method achieves better segmentation results for similar classes on the GID large-scale remote sensing image dataset, outperforming current mainstream models with an accuracy (Acc) of 81.99% and a mean intersection over union (mIoU) of 71.04%.