基于稠密扩张卷积的图像语义分割模型

doi:10.15888/j.cnki.csa.008376

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 4:52 星期四

首页 > 过刊浏览>2022年第31卷第3期 >19-29. DOI:10.15888/j.cnki.csa.008376

PDF HTML阅读 XML下载导出引用引用提醒

基于稠密扩张卷积的图像语义分割模型
DOI:
                        10.15888/j.cnki.csa.008376
                    
CSTR:
                        
                    
作者:
                        张富财张富财
浙江理工大学 信息学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找
许建龙许建龙
浙江理工大学 信息学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找
包晓安包晓安
浙江理工大学 信息学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:浙江省重点研发计划(2020C03094)

Image Semantic Segmentation Model Based on Dense Dilation Convolution

Author:

ZHANG Fu-Cai
ZHANG Fu-Cai
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找
XU Jian-Long
XU Jian-Long
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找
BAO Xiao-An
BAO Xiao-An
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [29]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

为解决图像语义分割任务中面对的分割场景的复杂性、分割对象的多样性及分割对象空间位置的差异性问题, 提高语义分割模型的精度, 提出基于稠密扩张卷积的双分支多层级语义分割网络(double branch and multi-stages network, DBMSNet). 首先采用主干网络提取输入图像的4个不同分辨率的特征图(De1、De2、De3、De4), 其次采用特征精炼(feature refine, FR)模块对De1和De3这两个特征图进行特征精炼处理, 特征精炼处理之后的输出分支经过混合扩张卷积模块(mixed dilation module, MDM)编码空间位置特征, De4分支采用金字塔池化模块(pyramid pooling module, PPM)编码高级语义特征, 最后将两个分支进行融合, 输出分割结果. 在数据集CelebAMask-HQ和Cityscapes中进行实验, 分别得到mIoU精度为74.64%、78.29%. 结果表明, 本文方法的分割精度高于对比方法, 且具有更少的参数量.

关键词:深度学习;图像语义分割;扩张卷积;稠密连接;多层级特征

Abstract:

Semantic segmentation is a very challenging task because of the complexity of parsing the scene, the diversity of segmented objects, and the differences in spatial positions of objects. To tackle this dilemma, this paper proposes a novel architecture named double branch and multi-stage network (DBMSNet) based on dense dilated convolution. Firstly, four feature maps (De1, De2, De3, and De4) with different resolutions are extracted by the backbone network, and then the feature refinement maps of De1 and De3 are output through the feature refinement (FR) module. Secondly, the output branch is processed by the mixed dilation module (MDM) to extract rich spatial location features, while the De4 branch is processed by the pyramid pooling module (PPM) to extract multi-scale semantic information. Finally, the two branches are merged and the segmentation result is output. Comprehensive experiments are conducted on two public datasets of CelebAMask-HQ and Cityscapes, on which our model achieves mean intersection-over-union (mIoU) scores of 74.64% and 78.29%, respectively. The results show that the segmentation accuracy of this study is higher than that of the counterpart method, and this method has fewer parameters.

Key words:deep learning;image semantic segmentation;dilation convolution;dense connection;multi-stages feature

参考文献

[1] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. [doi: 10.1109/TPAMI.2016.2572683

[2] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations. San Diego, 2015. 1–14.

[3] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1–9.

[4] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.

[5] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2261–2269.

[6] Wang RJ, Li X, Ling CX. Pelee: A real-time object detection system on mobile devices. Proceedings of the 32nd Conference on Neural Information Processing Systems. Montreal: Curran Associates Inc., 2018. 1967–1976.

[7] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. Proceedings of the 4th International Conference on Learning Representations. San Juan, 2016. 1–13.

[8] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. [doi: 10.1109/TPAMI.2017.2699184

[9] Zhao HS, Shi JP, Qi XJ, et al. Pyramid scene parsing network. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6230–6239.

[10] Fu J, Liu J, Tian HJ, et al. Dual attention network for scene segmentation. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3141–3149.

[11] Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation.arXiv: 1606.02147, 2016.

[12] Zhao HS, Qi XJ, Shen XY, et al. ICNet for real-time semantic segmentation on high-resolution images. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 418–434.

[13] Xiong ZQ, Wang ZC, Li J, et al. Using features specifically: An efficient network for scene segmentation based on dedicated attention mechanisms. IEEE Access, 2020, 8: 217947–217956. [doi: 10.1109/ACCESS.2020.3041748

[14] Chollet F. Xception: Deep learning with depthwise separable convolutions. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 1800–1807.

[15] Zhang XY, Zhou XY, Lin MX, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 6848–6856.

[16] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 3213–3223.

[17] Lee CH, Liu ZW, Wu LY, et al. MaskGAN: Towards diverse and interactive facial image manipulation. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5548–5557.

[18] Liu Y, Chu LT, Chen GW, et al. Paddleseg: A high-efficient development toolkit for image segmentation. arXiv: 2101.06175, 2021.

[19] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495. [doi: 10.1109/TPAMI.2016.2644615

[20] Mazzini D. Guided upsampling network for real-time semantic segmentation. Proceedings of the British Machine Vision Conference. Newcastle: BMVA Press, 2018. 117.

[21] Artacho B, Savakis A. Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors, 2019, 19(24): 5361. [doi: 10.3390/s19245361

[22] Lin GS, Milan A, Shen CH, et al. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 5168–5177.

[23] Bai X, Zhou J. Parallel global convolutional network for semantic image segmentation. IET Image Processing, 2021, 15(1): 252–259. [doi: 10.1049/ipr2.12025

[24] Yang MK, Yu K, Zhang C, et al. DenseASPP for semantic segmentation in street scenes. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 3684–3692.

[25] Bai X, Zhou J. Efficient semantic segmentation using multi-path decoder. Applied Sciences, 2020, 10(18): 6386. [doi: 10.3390/app10186386

[26] Wang PQ, Chen PF, Yuan Y, et al. Understanding convolution for semantic segmentation. Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018. 1451–1460.

[27] Yu CQ, Wang JB, Peng C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 334–349.

[28] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015. 234–241.

[29] Luo L, Xue DY, Feng XL. EHANet: An effective hierarchical aggregation network for face parsing. Applied Sciences, 2020, 10(9): 3135. [doi: 10.3390/app10093135

引用本文

张富财,许建龙,包晓安.基于稠密扩张卷积的图像语义分割模型.计算机系统应用,2022,31(3):19-29

复制

文章指标

点击次数:993
下载次数: 8167
HTML阅读次数: 1882
引用次数: 0

历史

收稿日期:2021-05-23
最后修改日期:2021-06-21
录用日期:
在线发布日期: 2022-01-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码