融合注意力与多尺度特征的城市街景实例分割
作者:
基金项目:

国家自然科学基金(41975183)


Instances Segmentation of Urban Streetscape Incorporating Attention and Multi-scale Feature
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [42]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    城市街道场景实例分割算法可以显著提升城市环境感知和智能交通系统的准确性与效率, 针对城市街景行人和车辆之间相互遮挡和背景干扰严重等问题, 提出一种基于频率注意力机制和多尺度特征融合的实例分割模型FMInst. 首先, 构建一种高低频注意力机制进行交互编码从而增加高分辨率细节信息. 其次, 在Swin Transformer主干网络的Patch Merging层引入软池化操作, 减少特征信息损失, 有效提高小尺度目标分割结果. 最后, 结合MLP层构建多尺度的深度卷积, 有效增强目标局部信息提取, 提升实例分割精度. 在Cityscapes公共数据集进行对比实验, 结果表明FMInst的mAP提高1.2%, 达35.6%, 同时AP50提高2.2%, 达61.4%, 极大地改善实例分割的掩码质量和分割效果.

    Abstract:

    Algorithms for the instance segmentation of urban street scenes can significantly improve the accuracy and efficiency of urban environment perception and intelligent transportation system. To address mutual occlusions between pedestrians and vehicles and significant background interference in urban street scenes, this study proposes an instance segmentation model, FMInst, based on a frequency attention mechanism and multi-scale feature fusion. Firstly, a high and low-frequency attention mechanism is constructed for interactive coding to increase high-resolution detail information. Secondly, a soft pooling operation is introduced into the Patch Merging layer of the Swin Transformer backbone network to reduce the loss of feature information and effectively improve the segmentation of small-scale targets. Finally, an MLP layer is combined to construct multi-scale deep convolution, which effectively enhances the extraction of local information and improves the segmentation accuracy. Comparison experiments conducted on the public dataset Cityscapes show that FMInst reaches an mAP of 35.6%, with an improvement of 1.2%, and an AP50 of 61.4%, with an improvement of 2.2%. The mask quality and the segmentation effect of the instance segmentation are greatly improved.

    参考文献
    [1] 苏丽, 孙雨鑫, 苑守正. 基于深度学习的实例分割研究综述. 智能系统学报, 2022, 17(1): 16–31.
    [2] 陈洛轩, 林成创, 郑招良, 等. Transformer在计算机视觉场景下的研究综述. 计算机科学, 2023, 50(12): 130–147.
    [3] Li ZW, Liu F, Yang WJ, et al. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999–7019.
    [4] Girshick R. Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015. 1440–1448.
    [5] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: ACM, 2015. 91–99.
    [6] Wang JF, Song L, Li ZM, et al. End-to-end object detection with fully convolutional network. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 15844–15853.
    [7] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 2980–2988.
    [8] Liu S, Qi L, Qin HF, et al. Path aggregation network for instance segmentation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8759–8768.
    [9] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5686–5696.
    [10] Zhang G, Lu X, Tan JR, et al. RefineMask: Towards high-quality instance segmentation with fine-grained features. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 6857–6865.
    [11] Tian Z, Shen CH, Wang XL, et al. BoxInst: High-performance instance segmentation with box annotations. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5439–5448.
    [12] Fang LY, Jiang YF, Yan YL, et al. Hyperspectral image instance segmentation using spectral-spatial feature pyramid network. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5502613.
    [13] Wang X, Girdhar R, Yu SX, et al. Cut and learn for unsupervised object detection and instance segmentation. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3124–3134.
    [14] Ouyang WZ, Xu ZL, Xu J, et al. MixingMask: A contour-aware approach for joint object detection and instance segmentation. Pattern Recognition, 2024, 155: 110620.
    [15] Liu S, Jia JY, Fidler S, et al. SGN: Sequential grouping networks for instance segmentation. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 3516–3524.
    [16] Gao NY, Shan YH, Wang YP, et al. SSAP: Single-shot instance segmentation with affinity pyramid. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 642–651.
    [17] Hu J, Cao LJ, Lu Y, et al. ISTR: End-to-end instance segmentation with Transformers. arXiv:2105.00637, 2021.
    [18] Ke L, Danelljan M, Li X, et al. Mask transfiner for high-quality instance segmentation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4402–4411.
    [19] Bolya D, Zhou C, Xiao FY, et al. YOLACT: Real-time instance segmentation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 9156–9165.
    [20] Xie EZ, Sun PZ, Song XG, et al. PolarMask: Single shot instance segmentation with polar representation. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12190–12199.
    [21] Chen H, Sun KY, Tian Z, et al. BlendMask: Top-down meets bottom-up for instance segmentation. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 8570–8578.
    [22] Wang XL, Zhang RF, Kong T, et al. SOLOv2: Dynamic and fast instance segmentation. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: ACM, 2020. 1487.
    [23] Cheng BW, Schwing AG, Kirillov A. Per-pixel classification is not all you need for semantic segmentation. Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021. 1367.
    [24] Cheng BW, Misra I, Schwing AG, et al. Masked-attention mask Transformer for universal image segmentation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 1280–1289.
    [25] He JJ, Li PY, Geng YF, et al. FastInst: A simple query-based model for real-time instance segmentation. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 23663–23672.
    [26] Gu ZX, Chen HX, Xu ZE. Diffusioninst: Diffusion model for instance segmentation. Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE, 2024. 2730–2734.
    [27] Liu Z, Hu H, Lin YT, et al. Swin Transformer V2: Scaling up capacity and resolution. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11999–12009.
    [28] Zou XY, Dou ZY, Yang JW, et al. Generalized decoding for pixel, image, and language. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 15116–15127.
    [29] Chen Z, Zhang J, Tao DC. Recurrent glimpse-based decoder for detection with Transformer. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 5250–5259.
    [30] Liu J, Chen SW, Wang BQ, et al. Attention as relation: Learning supervised multi-head self-attention for relation extraction. Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. Yokohama: ACM, 2021. 524.
    [31] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 12889–12899.
    [32] Stergiou A, Poppe R, Kalliatakis G. Refining activation downsampling with SoftPool. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 10337–10346.
    [33] Meena G, Mohbey KK, Indian A, et al. Identifying emotions from facial expressions using a deep convolutional neural network-based approach. Multimedia Tools and Applications, 2024, 83(6): 15711–15732.
    [34] Gao JT, Zhao XY, Li MY, et al. SMLP4Rec: An efficient all-MLP architecture for sequential recommendations. ACM Transactions on Information Systems, 2024, 42(3): 86.
    [35] Cordts M, Omran M, Ramos S, et al. The Cityscapes dataset for semantic urban scene understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 3213–3223.
    [36] Tian Z, Shen CH, Chen H. Conditional convolutions for instance segmentation. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 282–298.
    [37] Cheng TH, Wang XG, Chen SY, et al. BoxTeacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3145–3154.
    [38] Li WT, Liu WY, Zhu JK, et al. Box2Mask: Box-supervised instance segmentation via level-set evolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5157–5173.
    [39] Zhang T, Wei SQ, Ji SP. E2EC: An end-to-end contour-based method for high-quality high-speed instance segmentation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4433–4442.
    [40] Feng H, Zhou KY, Zhou WG, et al. Recurrent generic contour-based instance segmentation with progressive learning. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(9): 7947–7961.
    [41] Du CF, Liu PX, Song XS, et al. A two-pipeline instance segmentation network via boundary enhancement for scene understanding. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 4504913.
    [42] Gao LC, Wang SJ, Chen SG. A novel boundary-guided global feature fusion module for instance segmentation. Neural Processing Letters, 2024, 56(2): 91.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王军,吕佳,程勇.融合注意力与多尺度特征的城市街景实例分割.计算机系统应用,2025,34(1):90-99

复制
分享
文章指标
  • 点击次数:154
  • 下载次数: 408
  • HTML阅读次数: 106
  • 引用次数: 0
历史
  • 收稿日期:2024-06-24
  • 最后修改日期:2024-07-18
  • 在线发布日期: 2024-11-28
文章二维码
您是第11260430位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号