Abstract:Convolutional neural network (CNN), as an important part of U-Net baseline networks in the field of medical image segmentation, is mainly used to deal with the relationships among local feature information. Transformer is a visual model that can effectively strengthen the long-distance dependency among feature information. The previous study shows that Transformer can be combined with CNNs to improve the accuracy of medical image segmentation to a certain extent. However, labeled data in medical images are rarely available while a large amount of data is required to train the Transformer model, exposing the Transformer model to the challenges of high time consumption and a large number of parameters. Due to these considerations, this paper proposes a novel medical image segmentation model based on a hybrid multi-layer perception (MLP) network by combining the multi-scale hybrid MLP with a CNN based on the UNeXt model, namely, the LM-UNet model. This model can effectively enhance the connection between local and global information and strengthen the fusion between feature information. Experiments on multiple datasets reveal significantly improved segmentation performance of the LM-UNet model on the international skin imaging collaboration (ISIC) 2018 dataset manifested as an average Dice coefficient of 92.58% and an average intersection over union (IoU) coefficient of 86.52%, which are 3% and 3.5% higher than those of the UNeXt model, respectively. The segmentation effects of the proposed model on the osteoarthritis initiative-zuse institute Berlin two-dimensional (OAI-ZIB 2D) and the breast ultrasound image (BUSI) datasets are also substantially superior, represented as average Dice coefficients 2.5% and 1.0% higher than those of the UNeXt counterpart, respectively. In summary, the LM-UNet model not only improves the accuracy of medical image segmentation but also provides better generalization performance.