Abstract:In computer vision segmentation, the Transformer-based image segmentation model needs a large amount of image data to achieve the best performance. However, the data volume of medical images is very scarce compared with natural images. Convolution, with its higher inductive bias, is more suitable for medical images. To combine the long-range representation learning of Transformer with the inductive bias of CNN, a residual ConvNeXt module is designed to simulate the design structure of Transformer in this research. The module, composed of deep convolution and point wise convolution, is used to extract feature information, which greatly reduces the number of parameters. The receptive field and feature channel are effectively scaled and expanded to enrich the feature information. In addition, an asymmetric 3D U-shaped network called ASUNet is proposed for the segmentation of brain tumor images. In the asymmetric U-shaped structure, the output features of the last two encoders are connected by residual connection to expand the number of channels. Finally, deep supervision is used in the process of upsampling, which promotes the recovery of semantic information. Experimental results on the BraTS 2020 and FeTS 2021 datasets show that the dice scores of ET, WT, and TC reach 77.08%, 90.83%, 83.41%, and 75.63%, 90.45, 84.21%, respectively. Comparative experiments show that ASUNet can fully compete with Transformer-based models in terms of accuracy while maintaining the simplicity and efficiency of standard convolutional neural networks.