基于特征层次递进融合的轻量级图像超分辨率网络
作者:

Lightweight Image Super-resolution Network Based on Hierarchical Progressive Fusion of Feature
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [48]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    近年来, 随着深度学习技术的发展, 卷积神经网络(convolutional neural network, CNN)和Transformer在图像超分辨率(super-resolution, SR)领域取得了显著的进展. 但是, 对于图像全局特征的提取, 过去的方法大多采用的是堆叠单个算子重复计算来逐步扩大感受野的方式. 为了更好地利用全局信息, 提出了对局部、区域和全局特征进行显式建模. 具体来说, 通过通道注意增强卷积、基于划分窗口的Transformer和CNN的双分支并行架构、标准的Transformer和划分窗口的Transformer双分支并行架构, 以一种层次递进的方式对图像的局部信息、区域与局部信息、全局与区域信息进行提取和融合. 此外, 设计了一种层次特征融合方式来对CNN分支提取到的局部信息和划分窗口的Transformer提取到的区域信息进行特征融合. 大量的实验表明, 所提网络在轻量级SR领域实现了更好的结果. 例如, 在Manga109数据集的4倍放大实验中, 该网络的峰值信噪比(PSNR)相较于SwinIR提升了0. 51 dB.

    Abstract:

    In recent years, with the development of deep learning techniques, convolutional neural network (CNN) and Transformers have made significant progress in image super-resolution. However, for the extraction of global features of an image, it is common to stack individual operators and repeat the computation to gradually expand the receptive field. To better utilize global information, this study proposes that local, regional, and global features should be explicitly modeled. Specifically, local information, regional-local information, and global-regional information of an image are extracted and fused hierarchically and progressively through channel attention-enhanced convolution, a dual-branch parallel architecture consisting of a window-based Transformer and CNN, and a dual-branch parallel architecture consisting of a standard Transformer and a window-based Transformer. In addition, a hierarchical feature fusion method is designed to fuse the local information extracted from the CNN branch and the regional information extracted from the window-based Transformer. Extensive experiments show that the proposed network achieves better results in lightweight SR. For example, in the 4× upscaling experiments on the Manga109 dataset, the peak signal-to-noise ratio (PSNR) of the proposed network is improved by 0.51 dB compared to SwinIR.

    参考文献
    [1] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [2] Chen HT, Wang YH, Guo TY, et al. Pre-trained image processing Transformer. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 12294–12305.
    [3] Liang JY, Cao JZ, Sun GL, et al. SwinIR: Image restoration using swin Transformer. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021. 1833–1844.
    [4] Chen Z, Zhang YL, Gu JJ, et al. Cross aggregation Transformer for image restoration. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1847.
    [5] Dong C, Loy CC, He KM, et al. Learning a deep convolutional network for image super-resolution. Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014. 184–199.
    [6] Lim B, Son S, Kim H, et al. Enhanced deep residual networks for single image super-resolution. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1132–1140.
    [7] Zhang YL, Li KP, Li K, et al. Image super-resolution using very deep residual channel attention networks. Proceedings of the 15th European Conference on Computer Vision and Pattern Recognition. Munich: Springer, 2018. 294–310.
    [8] Li YW, Fan YC, Xiang XY, et al. Efficient and explicit modelling of image hierarchies for image restoration. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 18278–18289.
    [9] Chen Z, Zhang YL, Gu JJ, et al. Dual aggregation Transformer for image super-resolution. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023. 12278–12287.
    [10] Ahn N, Kang B, Sohn KA. Fast, accurate, and lightweight super-resolution with cascading residual network. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 256–272.
    [11] Chen XY, Wang XT, Zhou JT, et al. Activating more pixels in image super-resolution Transformer. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 22367–22377.
    [12] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
    [13] Lu ZS, Li JC, Liu H, et al. Transformer for single image super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022. 456–465.
    [14] Zhang XD, Zeng H, Guo S, et al. Efficient long-range attention network for image super-resolution. Proceedings of the 17th European Conference on Computer Vision. Tel Aviv: Springer, 2022. 649–667.
    [15] Ioffe S. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 1942–1950.
    [16] Zhang YL, Tian YP, Kong Y, et al. Residual dense network for image super-resolution. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 2472–2481.
    [17] Tai Y, Yang J, Liu XM. Image super-resolution via deep recursive residual network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2790–2798.
    [18] Luo XT, Xie Y, Zhang YL, et al. LatticeNet: Towards lightweight image super-resolution with lattice block. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 272–289.
    [19] Hui Z, Wang XM, Gao XB. Fast and accurate single image super-resolution via information distillation network. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 723–731.
    [20] Kim J, Lee JK, Lee KM. Deeply-recursive convolutional network for image super-resolution. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1637–1645.
    [21] Hui Z, Gao XB, Yang YC, et al. Lightweight image super-resolution with information multi-distillation network. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 2024–2032.
    [22] Sun B, Zhang YL, Jiang SY, et al. Hybrid pixel-unshuffled network for lightweight image super-resolution. Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023. 2375–2383.
    [23] Liu Z, Lin YT, Cao Y, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 9992–10002.
    [24] Dong XY, Bao JM, Chen DD, et al. CSWin Transformer: A general vision Transformer backbone with cross-shaped windows. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 12114–12124.
    [25] Wang H, Chen XH, Ni BB, et al. Omni aggregation networks for lightweight image super-resolution. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 22378–22387.
    [26] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [27] Shi WZ, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1874–1883.
    [28] Tan MX, Le QV. EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 6105–6114.
    [29] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [30] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2818–2826.
    [31] 周登文, 李文斌, 李金新, 等. 一种轻量级的多尺度通道注意图像超分辨率重建网络. 电子学报, 2022, 50(10): 2336–2346.
    [32] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122, 2016.
    [33] Agustsson E, Timofte R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1122–1131.
    [34] Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Proceedings of the 2012 British Machine Vision Conference. Surrey: BMVA Press, 2012. 1–10.
    [35] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations. Proceedings of the 7th International Conference on Curves and Surfaces. Avignon: Springer, 2012. 711–730.
    [36] Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001. 416–423.
    [37] Huang JB, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 5197–5206.
    [38] Matsui Y, Ito K, Aramaki Y, et al. Sketch-based manga retrieval using Manga109 dataset. Multimedia Tools and Applications, 2017, 76(20): 21811–21838.
    [39] Wang Z, Bovik AC, Sheikh HR, et al. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600–612.
    [40] Kingma DP, Ba J. Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. San Diego, 2015.
    [41] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 1–4.
    [42] Muqeet A, Hwang J, Yang SB, et al. Multi-attention based ultra lightweight image super-resolution. Proceedings of the 2020 European Conference on Computer Vision. Glasgow: Springer, 2020. 103–118.
    [43] Li WB, Zhou K, Qi L, et al. LAPAR: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1708.
    [44] Wang LG, Dong XY, Wang YQ, et al. Exploring sparsity in image super-resolution for efficient inference. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 4915–4924.
    [45] Gao GW, Wang ZX, Li JC, et al. Lightweight bimodal network for single-image super-resolution via symmetric CNN and recursive Transformer. Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna: ijcai.org, 2022. 913–919.
    [46] Gendy G, Sabor N, Hou JC, et al. A simple Transformer-style network for lightweight image super-resolution. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Vancouver: IEEE, 2023. 1484–1494.
    [47] Li A, Zhang L, Liu Y, et al. Feature modulation Transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023. 12480–12490.
    [48] Zhou XQ, Huang HB, He R, et al. MSRA-SR: Image super-resolution Transformer with multi-scale shared representation acquisition. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023. 12665–12676.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张豪,马冀,袁江.基于特征层次递进融合的轻量级图像超分辨率网络.计算机系统应用,2025,34(1):118-127

复制
分享
文章指标
  • 点击次数:119
  • 下载次数: 457
  • HTML阅读次数: 111
  • 引用次数: 0
历史
  • 收稿日期:2024-06-05
  • 最后修改日期:2024-06-28
  • 在线发布日期: 2024-11-15
文章二维码
您是第11198266位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号