###

计算机系统应用英文版:2024,33(3):73-84

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

多尺度特征金字塔融合的街景图像语义分割

曲海成, 王莹, 董康龙, 刘万军

(辽宁工程技术大学软件学院, 葫芦岛 125105)

Semantic Segmentation of Street Scenes Images Based on Multi-scale Feature Pyramid Fusion

QU Hai-Cheng, WANG Ying, DONG Kang-Long, LIU Wan-Jun

(Software College, Liaoning Technical University, Huludao 125105, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 400次下载 1401次
Received:August 31, 2023 Revised:September 26, 2023

中文摘要: 针对街景图像语义分割任务中的目标尺寸差异大、多尺度特征难以高效提取的问题, 本文提出了一种语义分割网络(LDPANet). 首先, 将空洞卷积与引入残差学习单元的深度可分离卷积结合, 来优化编码器结构, 在降低了计算复杂度的同时缓解梯度消失的问题. 然后利用层传递的迭代空洞空间金字塔, 将自顶向下的特征信息依次融合, 提高了上下文信息的有效交互能力; 在多尺度特征融合之后引入属性注意力模块, 使网络抑制冗余信息, 强化重要特征. 再者, 以通道扩展上采样代替双线插值上采样作为解码器, 进一步提升了特征图的分辨率. 最后, LDPANet方法在Cityscapes和CamVid数据集上的精度分别达到了91.8%和87.52%, 与近几年网络模型相比, 本文网络模型可以精确地提取像素的位置信息以及空间维度信息, 提高了语义分割的准确率.

中文关键词: 语义分割 MDSDC IDCP-LC 属性注意力通道扩展上采样特征融合

Abstract:This study proposes a semantic segmentation network called LDPANet to address the challenges of significant variations in target sizes and the difficulty of efficient extraction of multi-scale features in semantic segmentation tasks of street scene images. Firstly, the void convolution is combined with the deeply separable convolution introduced into the residual learning unit to optimize the encoder structure, which reduces computational complexity and alleviates the problem of gradient vanishing. Secondly, the network utilizes a layer-wise iterative void spatial pyramid to sequentially fuse top-down feature information, enhancing the effective interaction of contextual information. After multi-scale feature fusion, an attribute attention module is introduced to suppress redundant information and strengthen important features. Furthermore, channel-extended upsampling replaces two-wire interpolation upsampling as the decoder to further improve the resolution of feature maps. Finally, the accuracy of the LDPANet method on Cityscapes and CamVid datasets reaches 91.8% and 87.52%, respectively. Compared with the network model in recent years, the proposed network model can accurately extract pixel position information and spatial dimension information and improve the accuracy of semantic segmentation.

keywords: semantic segmentation mixed depthwise separable dilated convolution (MDSDC) iterative dilated convolution pyramid with layer cascade (IDCP-LC) attribute attention channel expansion upsampling feature fusion

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金面上项目(42271409); 辽宁省高等学校基本科研项目(LIKMZ20220699)

引用文本：
曲海成,王莹,董康龙,刘万军.多尺度特征金字塔融合的街景图像语义分割.计算机系统应用,2024,33(3):73-84
QU Hai-Cheng,WANG Ying,DONG Kang-Long,LIU Wan-Jun.Semantic Segmentation of Street Scenes Images Based on Multi-scale Feature Pyramid Fusion.COMPUTER SYSTEMS APPLICATIONS,2024,33(3):73-84

Author Name	Affiliation	E-mail
QU Hai-Cheng	Software College, Liaoning Technical University, Huludao 125105, China
WANG Ying	Software College, Liaoning Technical University, Huludao 125105, China	lntuwangying@163.com
DONG Kang-Long	Software College, Liaoning Technical University, Huludao 125105, China
LIU Wan-Jun	Software College, Liaoning Technical University, Huludao 125105, China