基于边缘特征和注意力机制的图像语义分割
作者:
基金项目:

国家自然科学基金(41975183)


Image Semantic Segmentation Based on Edge Features and Attention Mechanism
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [30]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在语义分割任务中, 编码器的下采样过程会导致分辨率降低, 造成图像空间信息细节的丢失, 因此在物体边缘会出现分割不连续或者错误分割的现象, 进而对整体分割性能产生负面影响. 针对上述问题, 提出基于边缘特征和注意力机制的图像语义分割模型EASSNet. 首先, 使用边缘检测算子计算原始图像的边缘图, 通过池化下采样和卷积运算提取边缘特征. 接着, 将边缘特征融合到经过编码器提取的深层语义特征当中, 恢复经过下采样的特征图像的空间细节信息, 并且通过注意力机制来强化有意义的信息, 从而提高物体边缘分割的准确性, 进而提升语义分割的整体性能. 最后, EASSNet在PASCAL VOC 2012和Cityscapes数据集上的平均交并比分别达到85.9%和76.7%, 与当前流行的语义分割网络相比, 整体分割性能和物体边缘的分割效果都具有明显优势.

    Abstract:

    In semantic segmentation tasks, the downsampling process of the encoder can lead to a decrease in resolution, resulting in the loss of spatial information details in the image. As a result, segmentation discontinuity or incorrect segmentation may occur at object edges, which can damage overall segmentation performance. To address the above issues, an image semantic segmentation model EASSNet based on edge features and attention mechanisms is proposed. Firstly, the edge detection operator is used to calculate the edge map of the original image, and edge features are extracted through pooling downsampling and convolution operations. Next, edge features are fused into deep semantic features extracted by the encoder, restoring the spatial detail information of downsampled feature images, and strengthening meaningful information through attention mechanisms to improve the accuracy of object edge segmentation and overall semantic segmentation performance. Finally, EASSNet achieves the average intersection over the union of 85.9% and 76.7% on the PASCAL VOC 2012 and Cityscapes datasets, respectively. Compared with current popular semantic segmentation networks, EASSNet has significant advantages in overall segmentation performance and object edge segmentation.

    参考文献
    [1] 王龙飞, 严春满. 道路场景语义分割综述. 激光与光电子学进展, 2021, 58(12): 1200002.
    [2] Jiang HJ, Wang RP, Shan SG, et al. Adaptive metric learning for zero-shot recognition. IEEE Signal Processing Letters, 2019, 26(9): 1270–1274.
    [3] Kong YY, Zhang BW, Yan BY, et al. Affiliated fusion conditional random field for urban UAV image semantic segmentation. Sensors, 2020, 20(4): 993.
    [4] Jiang F, Grigorev A, Rho S, et al. Medical image semantic segmentation based on deep learning. Neural Computing and Applications, 2018, 29(5): 1257–1265.
    [5] Xiao AR, Yang XF, Lu SJ, et al. FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 176: 237–249.
    [6] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651.
    [7] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.
    [8] Chen LC, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. Proceedings of the 3rd International Conference on Learning Representations. San Diego, 2015.
    [9] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848.
    [10] Chen LC, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587, 2017.
    [11] Chen LC, Zhu YK, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 833–851.
    [12] He KM, Zhang XY, Ren SQ, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916.
    [13] Dong RS, Pan XQ, Li FY. DenseU-Net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access, 2019, 7: 65347–65356.
    [14] Yu CQ, Wang JB, Peng C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 334–349.
    [15] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 1520–1528.
    [16] He JJ, Deng ZY, Zhou L, et al. Adaptive pyramid context network for semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 7511–7520.
    [17] Li HC, Xiong PF, Fan HQ, et al. DFANet: Deep feature aggregation for real-time semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 9514–9523.
    [18] Zhao HS, Shi JP, Qi XJ, et al. Pyramid scene parsing network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6230–6239.
    [19] Wang JD, Sun K, Cheng TH, et al. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349–3364.
    [20] Zhou L, Kong XY, Gong C, et al. FC-RCCN: Fully convolutional residual continuous CRF network for semantic segmentation. Pattern Recognition Letters, 2020, 130: 54–63.
    [21] Michieli U, Zanuttigh P. Edge-aware graph matching network for part-based semantic segmentation. International Journal of Computer Vision, 2022, 130(11): 2797–2821.
    [22] Chen RS, Zhang FL, Rhee T. Edge-aware convolution for RGB-D image segmentation. Proceedings of the 35th International Conference on Image and Vision Computing New Zealand. Wellington: IEEE, 2020. 1–6.
    [23] Kuang HL, Liang YX, Liu N, et al. BEA-SegNet: Body and edge aware network for medical image segmentation. Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine. Houston: IEEE, 2021. 939–944.
    [24] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [25] Wang QL, Wu BG, Zhu PF, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 11531–11539.
    [26] Hu Y, Wen GH, Luo MN, et al. Competitive inner-imaging squeeze and excitation for residual network. arXiv:1807.08920, 2018.
    [27] Hou QB, Zhou DQ, Feng JS. Coordinate attention for efficient mobile network design. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 13708–13717.
    [28] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 3–19.
    [29] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495.
    [30] He JJ, Deng ZY, Qiao Y. Dynamic multi-scale filters for semantic segmentation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 3561–3571.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王军,张霁云,程勇.基于边缘特征和注意力机制的图像语义分割.计算机系统应用,2024,33(7):63-73

复制
分享
文章指标
  • 点击次数:577
  • 下载次数: 1587
  • HTML阅读次数: 735
  • 引用次数: 0
历史
  • 收稿日期:2024-02-22
  • 最后修改日期:2024-03-19
  • 在线发布日期: 2024-05-31
文章二维码
您是第11354982位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号