多尺度特征金字塔融合的街景图像语义分割

doi:10.15888/j.cnki.csa.009411

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2024年第33卷第3期 >73-84. DOI:10.15888/j.cnki.csa.009411

PDF HTML阅读 XML下载导出引用引用提醒

多尺度特征金字塔融合的街景图像语义分割
DOI:
                        10.15888/j.cnki.csa.009411
                    
CSTR:
                        32024.14.csa.009411
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金面上项目(42271409); 辽宁省高等学校基本科研项目(LIKMZ20220699)

Semantic Segmentation of Street Scenes Images Based on Multi-scale Feature Pyramid Fusion

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对街景图像语义分割任务中的目标尺寸差异大、多尺度特征难以高效提取的问题, 本文提出了一种语义分割网络(LDPANet). 首先, 将空洞卷积与引入残差学习单元的深度可分离卷积结合, 来优化编码器结构, 在降低了计算复杂度的同时缓解梯度消失的问题. 然后利用层传递的迭代空洞空间金字塔, 将自顶向下的特征信息依次融合, 提高了上下文信息的有效交互能力; 在多尺度特征融合之后引入属性注意力模块, 使网络抑制冗余信息, 强化重要特征. 再者, 以通道扩展上采样代替双线插值上采样作为解码器, 进一步提升了特征图的分辨率. 最后, LDPANet方法在Cityscapes和CamVid数据集上的精度分别达到了91.8%和87.52%, 与近几年网络模型相比, 本文网络模型可以精确地提取像素的位置信息以及空间维度信息, 提高了语义分割的准确率.

Abstract:

This study proposes a semantic segmentation network called LDPANet to address the challenges of significant variations in target sizes and the difficulty of efficient extraction of multi-scale features in semantic segmentation tasks of street scene images. Firstly, the void convolution is combined with the deeply separable convolution introduced into the residual learning unit to optimize the encoder structure, which reduces computational complexity and alleviates the problem of gradient vanishing. Secondly, the network utilizes a layer-wise iterative void spatial pyramid to sequentially fuse top-down feature information, enhancing the effective interaction of contextual information. After multi-scale feature fusion, an attribute attention module is introduced to suppress redundant information and strengthen important features. Furthermore, channel-extended upsampling replaces two-wire interpolation upsampling as the decoder to further improve the resolution of feature maps. Finally, the accuracy of the LDPANet method on Cityscapes and CamVid datasets reaches 91.8% and 87.52%, respectively. Compared with the network model in recent years, the proposed network model can accurately extract pixel position information and spatial dimension information and improve the accuracy of semantic segmentation.

参考文献

相似文献

引证文献

引用本文

曲海成,王莹,董康龙,刘万军.多尺度特征金字塔融合的街景图像语义分割.计算机系统应用,2024,33(3):73-84

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-08-31
最后修改日期:2023-09-26
录用日期:
在线发布日期: 2023-12-26
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码