Abstract:This study proposes a cross-modal fusion dual attention net (CFDA-Net) for brain tumor image segmentation to solve the insufficient multi-modal information fusion of brain tumors and detail loss of the tumor regions. Based on the encoder-decoder architecture, a new convolutional block with dense blocks and large kernel attention parallel is first adopted in the encoder branch, which can effectively fuse global and local information and prevent the gradient vanishing during backpropagation. Secondly, a multi-modal deep fusion module is added to the left sides of the second, third, and fourth layers of the encoder to effectively utilize the complementary information among different modalities. Then, in the decoder branch, Shuffle Attention is adopted to group the feature maps and aggregate them, and the subfeatures of the group are divided into two parts to obtain important attention features of space and channels. Finally, binary cross entropy (BCE), Dice Loss, and L2 Loss are employed to form a new hybrid loss function, which alleviates the category imbalance of brain tumor data and further improves the segmentation performance. The experimental results on the BraTS2019 brain tumor dataset show that the average Dice coefficient values of the model in the whole tumor region, tumor core region, and tumor enhancement region are 0.887, 0.892, and 0.815 respectively. The proposed model has better segmentation performance in the core and enhanced regions of tumors than other advanced segmentation methods such as ADHDC-Net and SDS-MSA-Net.