Abstract:This study aims to solve the problems faced by traditional U-Net network in the semantic segmentation task of street scene images, such as the low accuracy of object segmentation under multi-scale categories and the poor correlation of image context features. To this end, it proposes an improved U-Net semantic segmentation network AS-UNet to achieve accurate segmentation of street scene images. Firstly, the spatial and channel squeeze & excitation block (scSE) attention mechanism module is integrated into the U-Net network to guide the convolutional neural network to focus on semantic categories related to segmentation tasks in both channel and space dimensions, to extract more effective semantic information. Secondly, to obtain the global context information of the image, the multi-scale feature map is aggregated for feature enhancement, and the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module is embedded into the U-Net network. Finally, the cross-entropy loss function and Dice loss function are combined to solve the problem of unbalanced target categories in street scenes, and the accuracy of segmentation is further improved. The experimental results show that the mean intersection over union (MIoU) of the AS-UNet network model in the Cityscapes and CamVid datasets increases by 3.9% and 3.0%, respectively, compared with the traditional U-Net network. The improved network model significantly improves the segmentation effect of street scene images.