双分支注意力与FasterNet相融合的航拍场景分类
作者:
基金项目:

国家自然科学基金(62173171); 国家自然科学基金青年基金(41801368)


Aerial Scene Classification by Fusion of Dual-branch Attention and FasterNet
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [37]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    航拍高分辨率图像的场景类别多且类间相似度高, 经典的基于深度学习的分类方法, 由于在提取特征过程中会产生冗余浮点运算, 运行效率较低, FasterNet通过部分卷积提高了运行效率但会降低模型的特征提取能力, 从而降低模型的分类精度. 针对上述问题, 提出了一种融合FasterNet和注意力机制的混合结构分类方法. 首先采用“十字型卷积模块”对场景特征进行部分提取, 以提高模型运行效率. 然后采用坐标注意力与通道注意力相融合的双分支注意力机制, 以增强模型对于特征的提取能力. 最后将“十字型卷积模块”与双分支注意力模块之间进行残差连接, 使网络能训练到更多与任务相关的特征, 从而在提高分类精度的同时, 减小运行代价, 提高运行效率. 实验结果表明, 与现有基于深度学习的分类模型相比, 所提出的方法, 推理时间短而且准确率高, 参数量为19M, 平均一张图像的推理时间为7.1 ms, 在公开的数据集NWPU-RESISC45、EuroSAT、VArcGIS (10%)和VArcGIS (20%)的分类精度分别为96.12%、98.64%、95.42%和97.87%, 与FasterNet相比分别提升了2.06%、0.77%、1.34%和0.65%.

    Abstract:

    The scenes in high-resolution aerial images are of many highly similar categories. The classic classification method based on deep learning offers low operational efficiency because of the redundant floating-point operations generated in the feature extraction process. FasterNet improves the operational efficiency through partial convolution but reduces the feature extraction ability and hence the classification accuracy of the model. To address the above problems, this study proposes a hybrid structure classification method integrating FasterNet and the attention mechanism. Specifically, the “cross-shaped convolution module” is used to partially extract scene features and thereby improve the operational efficiency of the model. Then, a dual-branch attention mechanism that integrates coordinate attention and channel attention is used to enable the model to better extract features. Finally, a residual connection is made between the “cross-shaped convolution module” and the dual-branch attention module so that more task-related features can be obtained from network training, thereby reducing operational costs and improving operational efficiency in addition to improving classification accuracy. The experimental results show that compared with the existing classification models based on deep learning, the proposed method has a short inference time and high accuracy. Its number of parameters is 19M, and its average inference time for one image is 7.1 ms. The classification accuracy of the proposed method on the public datasets NWPU-RESISC45, EuroSAT, VArcGIS (10%), and VArcGIS (20%) is 96.12%, 98.64%, 95.42%, and 97.87%, respectively, which is 2.06%, 0.77%, 1.34%, and 0.65% higher than that of the FasterNet model, respectively.

    参考文献
    [1] Máttyus G, Luo WJ, Urtasun R. DeepRoadMapper: Extracting road topology from aerial images. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 3458–3466.
    [2] Martha TR, Kerle N, van Westen CJ, et al. Segment optimization and data-driven thresholding for knowledge-based landslide detection by object-based image analysis. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(12): 4928–4943.
    [3] Longbotham N, Chaapel C, Bleiler L, et al. Very high resolution multiangle urban classification analysis. IEEE Transactions on Geoscience and Remote Sensing, 2012, 50(4): 1155–1170.
    [4] Kim M, Madden M, Warner TA. Forest type mapping using object-specific texture measures from multispectral Ikonos imagery: Segmentation quality and image classification issues. Photogrammetric Engineering & Remote Sensing, 2009, 75(7): 819–829.
    [5] Ghosh R, Jia XW, Kumar V. Land cover mapping in limited labels scenario: A survey. arXiv:2103.02429, 2021.
    [6] 骆剑承, 王钦敏, 马江洪, 等. 遥感图像最大似然分类方法的EM改进算法. 测绘学报, 2002, 31(3): 234–239.
    [7] 朱建华, 刘政凯, 俞能海. 一种多光谱遥感图象的自适应最小距离分类方法. 中国图象图形学报, 2000, 5(1): 21–24.
    [8] Foody GM, Mathur A. A relative evaluation of multiclass image classification by support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 2004, 42(6): 1335–1343.
    [9] Chen YS, Lin ZH, Zhao X, et al. Deep learning-based classification of hyperspectral data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 7(6): 2094–2107.
    [10] Avramović A, Risojević V. Block-based semantic classification of high-resolution multispectral aerial images. Signal, Image and Video Processing, 2016, 10(1): 75–84.
    [11] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1–9.
    [12] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2012. 1097–1105.
    [13] Hu W, Huang YY, Wei L, et al. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors, 2015, 2015: 258619.
    [14] Duan YP, Liu F, Jiao LC, et al. SAR image segmentation based on convolutional-wavelet neural network and Markov random field. Pattern Recognition, 2017, 64: 255–267.
    [15] Wang GL, Fan B, Xiang SM, et al. Aggregating rich hierarchical features for scene classification in remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017, 10(9): 4104–4115.
    [16] Xie J, He NJ, Fang LY, et al. Scale-free convolutional neural network for remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(9): 6916–6928.
    [17] Liu YF, Zhong YF, Qin QQ. Scene classification based on multiscale convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(12): 7109–7121.
    [18] Roy SK, Deria A, Hong DF, et al. Multimodal fusion Transformer for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5515620.
    [19] 金传, 童常青. 融合CNN与Transformer结构的遥感图像分类方法. 激光与光电子学进展, 2023, 60(20): 2028006.
    [20] Touvron H, Bojanowski P, Caron M, et al. ResMLP: Feedforward networks for image classification with data-efficient training. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5314–5321.
    [21] Zhu HR, Chen BY, Yang C. Understanding why ViT trains badly on small datasets: An intuitive perspective. arXiv:2302.03751, 2023.
    [22] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
    [23] Chen JR, Kao SH, He H, et al. Run, don’t walk: Chasing higher FLOPs for faster neural networks. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 12021–12031.
    [24] Hou QB, Zhou DQ, Feng JS. Coordinate attention for efficient mobile network design. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 13708–13717.
    [25] Liu Z, Lin YT, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer using shifted windows. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 9992–10002.
    [26] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    [27] Sandler M, Howard A, Zhu ML, et al. MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4510–4520.
    [28] Liu Z, Mao HZ, Wu CY, et al. A ConvNet for the 2020s. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11966–11976.
    [29] Ma NN, Zhang XY, Zheng HT, et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 122–138.
    [30] Zhang XY, Zhou XY, Lin MX, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 6848–6856.
    [31] Mehta S, Rastegari M. MobileViT: Light-weight, general-purpose, and mobile-friendly Vision Transformer. Proceedings of the 10th International Conference on Learning Representations. OpenReviw.net, 2022.
    [32] Helber P, Bischke B, Dengel A, et al. EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019, 12(7): 2217–2226.
    [33] Cheng G, Han JW, Lu XQ. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 2017, 105(10): 1865–1883.
    [34] Hou DY, Miao ZL, Xing HQ, et al. V-RSIR: An open access Web-based image annotation tool for remote sensing image retrieval. IEEE Access, 2019, 7: 83852–83862.
    [35] Bazi Y, Bashmal L, Rahhal MMA, et al. Vision Transformers for remote sensing image classification. Remote Sensing, 2021, 13(3): 516.
    [36] 宋冠武, 陈知明, 李建军. 基于ResNet-50的级联注意力遥感图像分类. 广西师范大学学报(自然科学版), 2023, 41(6): 80–91.
    [37] 郭东恩. 基于深度学习的遥感图像场景分类研究 [博士学位论文]. 重庆: 重庆邮电大学, 2021.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨本臣,曲业田,金海波.双分支注意力与FasterNet相融合的航拍场景分类.计算机系统应用,2024,33(5):15-27

复制
分享
文章指标
  • 点击次数:450
  • 下载次数: 1210
  • HTML阅读次数: 742
  • 引用次数: 0
历史
  • 收稿日期:2023-11-30
  • 最后修改日期:2023-12-29
  • 在线发布日期: 2024-04-07
文章二维码
您是第11123260位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号