Abstract:Due to the uncertainty of objects in remote sensing images and significant differences in feature information between different images, existing super-resolution methods yield poor reconstruction results. Therefore, this study proposes an NG-MAT model that combines the Swin Transformer and the N-gram model to achieve super-resolution of remote sensing images. Firstly, multiple attention modules are connected in parallel on the branch of the original Transformer to extract global feature information for activating more pixels. Secondly, the N-gram model from natural language processing is applied to the field of image processing, utilizing a trigram N-gram model to enhance information interaction between windows. The proposed method achieves peak signal-to-noise ratios of 34.68 dB, 31.03 dB, and 28.99 dB at amplification factors of 2, 3, and 4, respectively, and structural similarity indices of 0.926 6, 0.844 4, and 0.773 4 at the same amplification factors on the selected dataset. Experimental results demonstrate that the proposed method outperforms other similar methods in various metrics.