Abstract:Semantic segmentation is a very challenging task because of the complexity of parsing the scene, the diversity of segmented objects, and the differences in spatial positions of objects. To tackle this dilemma, this paper proposes a novel architecture named double branch and multi-stage network (DBMSNet) based on dense dilated convolution. Firstly, four feature maps (De1, De2, De3, and De4) with different resolutions are extracted by the backbone network, and then the feature refinement maps of De1 and De3 are output through the feature refinement (FR) module. Secondly, the output branch is processed by the mixed dilation module (MDM) to extract rich spatial location features, while the De4 branch is processed by the pyramid pooling module (PPM) to extract multi-scale semantic information. Finally, the two branches are merged and the segmentation result is output. Comprehensive experiments are conducted on two public datasets of CelebAMask-HQ and Cityscapes, on which our model achieves mean intersection-over-union (mIoU) scores of 74.64% and 78.29%, respectively. The results show that the segmentation accuracy of this study is higher than that of the counterpart method, and this method has fewer parameters.