融合注意力和多尺度特征的街景图像语义分割

doi:10.15888/j.cnki.csa.009513

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月2日 16:52 星期三

首页 > 过刊浏览>2024年第33卷第5期 >94-102. DOI:10.15888/j.cnki.csa.009513

PDF HTML阅读 XML下载导出引用引用提醒

融合注意力和多尺度特征的街景图像语义分割
DOI:
                        10.15888/j.cnki.csa.009513
                    
CSTR:
                        32024.14.csa.009513
                    
作者:
                        洪军洪军
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找
刘笑楠刘笑楠
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找
刘振宇刘振宇
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:辽宁省应用基础研究计划(2023JH2/101300225)

Semantic Segmentation of Street View Image Based on Attention and Multi-scale Features

Author:

HONG Jun
HONG Jun
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Xiao-Nan
LIU Xiao-Nan
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Zhen-Yu
LIU Zhen-Yu
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [21]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

为了解决在街道场景图像语义分割任务中传统U-Net网络在多尺度类别下目标分割的准确率较低和图像上下文特征的关联性较差等问题, 提出一种改进U-Net的语义分割网络AS-UNet, 实现对街道场景图像的精确分割. 首先, 在U-Net网络中融入空间通道挤压激励(spatial and channel squeeze & excitation block, scSE)注意力机制模块, 在通道和空间两个维度来引导卷积神经网络关注与分割任务相关的语义类别, 以提取更多有效的语义信息; 其次, 为了获取图像的全局上下文信息, 聚合多尺度特征图来进行特征增强, 将空洞空间金字塔池化(atrous spatial pyramid pooling, ASPP)多尺度特征融合模块嵌入到U-Net网络中; 最后, 通过组合使用交叉熵损失函数和Dice损失函数来解决街道场景目标类别不平衡的问题, 进一步提升分割的准确性.实验结果表明, 在街道场景Cityscapes数据集和CamVid数据集上AS-UNet网络模型的平均交并比(mean intersection over union, MIoU)相较于传统U-Net网络分别提高了3.9%和3.0%, 改进的网络模型显著提升了对街道场景图像的分割效果.

关键词:图像语义分割;街道场景;U-Net;注意力机制;多尺度特征融合

Abstract:

This study aims to solve the problems faced by traditional U-Net network in the semantic segmentation task of street scene images, such as the low accuracy of object segmentation under multi-scale categories and the poor correlation of image context features. To this end, it proposes an improved U-Net semantic segmentation network AS-UNet to achieve accurate segmentation of street scene images. Firstly, the spatial and channel squeeze & excitation block (scSE) attention mechanism module is integrated into the U-Net network to guide the convolutional neural network to focus on semantic categories related to segmentation tasks in both channel and space dimensions, to extract more effective semantic information. Secondly, to obtain the global context information of the image, the multi-scale feature map is aggregated for feature enhancement, and the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module is embedded into the U-Net network. Finally, the cross-entropy loss function and Dice loss function are combined to solve the problem of unbalanced target categories in street scenes, and the accuracy of segmentation is further improved. The experimental results show that the mean intersection over union (MIoU) of the AS-UNet network model in the Cityscapes and CamVid datasets increases by 3.9% and 3.0%, respectively, compared with the traditional U-Net network. The improved network model significantly improves the segmentation effect of street scene images.

Key words:image semantic segmentation;street scene;U-Net;attention mechanism;multi-scale feature fusion

参考文献

[1] 王龙飞, 严春满. 道路场景语义分割综述. 激光与光电子学进展, 2021, 58(12): 1200002.

[2] Grigorescu S, Trasnea B, Cocias T, et al. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 2020, 37(3): 362–386.

[3] 钮圣虓, 王盛, 杨晶晶, 等. 完全基于边缘信息的快速图像分割算法. 计算机辅助设计与图形学学报, 2012, 24(11): 1410–1419.

[4] 郑美珠, 赵景秀. 基于区域一致性测度的彩色图像边缘检测. 计算机应用, 2011, 31(9): 2485–2488, 2492.

[5] 付云凤. 基于阈值的图像分割研究 [硕士学位论文]. 重庆: 重庆大学, 2013.

[6] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 3431–3440.

[7] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495.

[8] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.

[9] Zhou ZW, Siddiquee MMR, Tajbakhsh N, et al. U-Net++: A nested U-Net architecture for medical image segmentation. Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis. Granada: Springer, 2018. 3–11.

[10] Oktay O, Schlemper J, Le Folgoc L, et al. Attention U-Net: Learning where to look for the pancreas. arXiv:1804.03999, 2018.

[11] Zhao HS, Shi JP, Qi XJ, et al. Pyramid scene parsing network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6230–6239.

[12] Chen LC, Zhu YK, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 833–851.

[13] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848.

[14] Chen LC, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS. Computer Science, 2014, (4): 357–361.

[15] Chen LC, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587, 2017.

[16] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. Proceedings of the 4th International Conference on Learning Representations. San Juan: OpenReview.net, 2016.

[17] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 3–19.

[18] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.

[19] Roy AG, Navab N, Wachinger C. Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. Proceedings of the 21st International Conference on Medical Image Computing and Computer-assisted Intervention. Granada: Springer, 2018. 421–429.

[20] Mehta S, Rastegari M, Caspi A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 561–580.

[21] Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147, 2016.

引用本文

洪军,刘笑楠,刘振宇.融合注意力和多尺度特征的街景图像语义分割.计算机系统应用,2024,33(5):94-102

复制

文章指标

点击次数:528
下载次数: 1486
HTML阅读次数: 766
引用次数: 0

历史

收稿日期:2023-12-06
最后修改日期:2024-01-09
录用日期:
在线发布日期: 2024-04-07
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码