基于混合泛化Transformer的轻量化图像超分辨率重建
作者:
基金项目:

辽宁省自然科学基金面上项目(2022-MS-276)


Lightweight Image Super-resolution Reconstruction Based on Hybrid Generalization Transformer
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • | | | |
  • 文章评论
    摘要:

    基于Transformer方法凭借自注意力机制在图像超分辨率重建领域中展现出卓越的性能, 然而自注意力机制也带来了非常高的计算成本, 针对此问题提出一种基于混合泛化Transformer的轻量化图像超分辨率重建模型. 该模型建立在SwinIR网络架构的基础上, 首先, 采用矩形窗口自注意机制(RWSA), 利用不同头部的水平和垂直矩形窗口代替传统的正方形窗口模式, 整合跨越不同窗口的特征. 其次, 引用递归泛化自注意力机制(RGSA)将输入特征递归地聚合到具有代表性的特征映射中, 然后利用交叉注意力来提取全局信息, 同时将RWSA和RGSA交替结合, 以更有效地利用全局上下文信息. 最后, 为了激活更多的像素以获得更好的恢复, 使用通道注意力机制和自注意力机制并联地对输入图像进行特征提取. 在5种基准数据集的测试结果表明, 该模型在保持模型参数轻量化的同时取得了更好的重建性能.

    Abstract:

    Transformer method, relying on a self-attention mechanism, exhibits remarkable performance in the field of image super-resolution reconstruction. Nevertheless, the self-attention mechanism also brings about a very high computational cost. To address this issue, a lightweight image super-resolution reconstruction model based on a hybrid generalized Transformer is proposed. This model is built based on the SwinIR network architecture. Firstly, the rectangular window self-attention (RWSA) mechanism is adopted. It utilizes horizontal and vertical rectangular windows with different heads to replace the traditional square window pattern, integrating features across different windows. Secondly, the recursive generalized self-attention (RGSA) mechanism is introduced to recursively aggregate input features into representative feature maps, followed by the application of cross-attention to extract global information. Meanwhile, RWSA and RGSA are alternately combined to make more effective use of global context information. Finally, to activate more pixels for better recovery, the channel attention mechanism and self-attention mechanism are used in parallel to extract features from the input image. Test results of five benchmark datasets show that this model achieves better reconstruction performance while keeping the model parameters lightweight.

    参考文献
    [1] 张瑾, 李佳莹, 李晓阳, 等. 基于SRGAN的图像超分辨率重建. 电脑知识与技术, 2024, 20(1): 14–17.
    [2] 吴丽君, 蔡周威, 陈志聪. 基于改进残差特征蒸馏的轻量级超分辨率网络. 计算机与现代化, 2022(11): 89–94.
    [3] Dong C, Loy CC, He KM, et al. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295–307.
    [4] Lim B, Son S, Kim H, et al. Enhanced deep residual networks for single image super-resolution. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1132–1140.
    [5] Zhang YL, Li KP, Li K, et al. Image super-resolution using very deep residual channel attention networks. Proceeding of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 294–310.
    [6] Zhou SC, Zhang JW, Zuo WM, et al. Cross-scale internal graph neural network for image super-resolution. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 295.
    [7] Chen HT, Wang YH, Guo TY, et al. Pre-trained image processing Transformer. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 12294–12305.
    [8] Liang JY, Cao JZ, Sun GL, et al. SwinIR: Image restoration using Swin Transformer. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021. 1833–1844.
    [9] Zhang XD, Zeng H, Guo S, et al. Efficient long-range attention network for image super-resolution. Proceedings of the 17th European Conference on Computer Vision. Tel Aviv: Springer, 2022. 649–667.
    [10] Chen XY, Wang XT, Zhang WL, et al. HAT: Hybrid attention Transformer for image restoration. arXiv:2309.05239, 2023.
    [11] Ray A, Kumar G, Kolekar MH. CFAT: Unleashing triangular Windows for image super-resolution. arXiv:2403.16143, 2024.
    [12] Ahn N, Kang B, Sohn KA. Fast, accurate, and lightweight super-resolution with cascading residual network. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 256–272.
    [13] Hui Z, Gao XB, Yang YC, et al. Lightweight image super-resolution with information multi-distillation network. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 2024–2032.
    [14] Luo XT, Xie Y, Zhang YL, et al. LatticeNet: Towards lightweight image super-resolution with lattice block. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 272–289.
    [15] Li WB, Zhou K, Qi L, et al. LAPAR: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: ACM, 2020. 1708.
    [16] Lu ZS, Li JC, Liu H, et al. Transformer for single image super-resolution. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022. 456–465.
    [17] 朱凯, 李理, 张彤, 等. 视觉Transformer在低级视觉领域的研究综述. 计算机工程与应用, 2024, 60(4): 39–56.
    [18] 王鑫, 余磊. 多尺度双注意力的图像超分辨率重建方法. 计算机与现代化, 2024(8): 77–87.
    [19] Shi WZ, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1874–1883.
    [20] Agustsson E, Timofte R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1122–1131.
    [21] Timofte R, Agustsson E, van Gool L, et al. NTIRE 2017 challenge on single image super-resolution: Methods and results. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1110–1121.
    [22] 徐雯捷, 宋慧慧, 袁晓彤, 等. 轻量级注意力特征选择循环网络的超分重建. 中国图象图形学报, 2021, 26(12): 2826–2835.
    [23] Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Proceedings of the 2012 British Machine Vision Conference. Surrey: BMVA Press, 2012. 1–10.
    [24] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations. Proceedings of the 7th International Conference on Curves and Surfaces. Avignon: Springer, 2012. 711–730.
    [25] Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001. 416–423.
    [26] Huang JB, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 5197–5206.
    [27] Matsui Y, Ito K, Aramaki Y, et al. Sketch-based manga retrieval using Manga109 dataset. Multimedia Tools and Applications, 2017, 76(20): 21811–21838.
    [28] 徐国明, 王杰, 马健, 等. 基于双重注意力残差网络的偏振图像超分辨率重建. 光子学报, 2022, 51(4): 0410001.
    [29] 刘婉春, 景明利, 王子昭, 等. 基于Transformer和双残差网络的图像去模糊算法研究. 信息技术与信息化, 2023(1): 217–220.
    相似文献
    引证文献
引用本文

刘俊辰,张文波,杨大为.基于混合泛化Transformer的轻量化图像超分辨率重建.计算机系统应用,2025,34(3):143-151

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-30
  • 最后修改日期:2024-09-19
  • 在线发布日期: 2025-01-21
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号