用于3D器官图像分割的波随机自注意力编码器
作者:
基金项目:

国家重点研发计划(2023YFF0612102); 青岛市重点科技攻关及产业化示范项目(23-7-2-qljh-4-gx, 24-1-2-qljh-19-gx)


Wave Random Self-attention Encoder for 3D Organ Image Segmentation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在光谱三维CT数据中, 传统卷积的全局特征捕捉能力不足, 而全尺度的自注意力机制则需要大量的计算资源. 为了解决这一问题, 本文引入一种新视觉注意力范式(wave self-attention, WSA). 相比于ViT技术, 该机制使用更少的资源获得同等的自注意力信息. 此外, 为更充分地提取器官间的相对依赖关系并提高模型的鲁棒性和执行速度, 本文为WSA机制设计了一种即插即用的模块——波随机编码器(wave random encoder, WRE). 该编码器能够生成一对互逆的非对称全局(局部)位置信息矩阵. 其中, 全局位置矩阵用来对波特征进行全局性的随机取样, 局部位置矩阵则用于补充因随机取样而丢失的局部相对依赖. 本文在标准数据集Synapse和COVID-19的肾脏和肺实质的分割任务上进行实验. 结果表明, 本文方法在精度、参数量和推理速率方面均超越了nnFormer、Swin-UNETR等现有模型, 达到了SOTA水平.

    Abstract:

    In spectral 3D CT data, the traditional convolution has a poor ability to capture global features, and the full-scale self-attention mechanism consumes large resources. To solve this problem, this study introduces a new visual attention paradigm, the wave self-attention (WSA). Compared with the ViT technology, this mechanism uses fewer resources to obtain the same amount of self-attention information. In addition, to more adequately extract the relative dependency among organs and to improve the robustness and execution speed of the model, a plug-and-play module, the wave random-encoder (WRE), is designed for the WSA mechanism. The encoder is capable of generating a pair of mutually inverse asymmetric global (local) position information matrices. The global position matrix is used to globally conduct random sampling of the wave features, and the local position matrix is used to complement the local relative dependency lost due to random sampling. In this study, experiments are performed on the task of segmenting the kidney and lung parenchyma in the standard datasets Synapse and COVID-19. The results show that this method outperforms existing models such as nnFormer and Swin-UNETR in terms of accuracy, the number of parameters, and inference rate, arriving at the SOTA level.

    参考文献
    [1] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.
    [2] Milletari F, Navab N, Ahmadi SA. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford: IEEE, 2016. 565–571.
    [3] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. ICLR, 2021.
    [4] Cai YT, Wang Y. MA-Unet: An improved version of Unet based on multi-scale and attention mechanism for medical image segmentation. Proceedings of the 3rd International Conference on Electronics and Communication; Network and Computer Technology. Harbin: SPIE, 2022. 121670X.
    [5] Hatamizadeh A, Tang YC, Nath V, et al. UNETR: Transformers for 3D medical image segmentation. Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022. 1748–1758.
    [6] Cao H, Wang YY, Chen J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation. Proceedings of the 2023 European Conference on Computer Vision. Tel Aviv: Springer, 2023. 205–218.
    [7] Butanovs E, Zolotarjovs A, Kuzmin A, et al. Nanoscale X-ray detectors based on individual CdS, SnO2 and ZnO nanowires. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 2021, 1014: 165736.
    [8] Peken T, Adiga S, Tandon R, et al. Deep learning for SVD and hybrid beamforming. IEEE Transactions on Wireless Communications, 2020, 19(10): 6621–6642.
    [9] Bendjador H, Deffieux T, Tanter M, et al. The SVD beamformer: Physical principles and application to ultrafast adaptive ultrasound. IEEE Transactions on Medical Imaging, 2020, 39(10): 3100–3112.
    [10] Abdelwahab KM, Abd El-Atty SM, El-Shafai W, et al. Efficient SVD-based audio watermarking technique in FRT domain. Multimedia Tools and Applications, 2020, 79(9): 5617–5648.
    [11] Diwakar M, Kumar P, Singh P, et al. An efficient reversible data hiding using SVD over a novel weighted iterative anisotropic total variation based denoised medical images. Biomedical Signal Processing and Control, 2023, 82: 104563.
    [12] Bhatti A, Ishii T, Kanno N, et al. Region-based SVD processing of high-frequency ultrafast ultrasound to visualize cutaneous vascular networks. Ultrasonics, 2023, 129: 106907.
    [13] Shi MW, Zhang F, Wang SW, et al. Detail preserving image denoising with patch-based structure similarity via sparse representation and SVD. Computer Vision and Image Understanding, 2021, 206: 103173.
    [14] Ding XH, Zhang XY, Han JG, et al. Scaling up your kernels to 31×31: Revisiting large kernel design in CNNs. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11953–11965.
    [15] He KM, Zhang XY, Ren SQ, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015. 1026–1034.
    [16] He KM, Chen XL, Xie SN, et al. Masked autoencoders are scalable vision learners. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 15979–15988.
    [17] Çiçek Ö, Abdulkadir A, Lienkamp SS, et al. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Proceedings of the 19th International Conference on Medical Image Computing and Computer-assisted Intervention. Athens: Springer, 2016. 424–432.
    [18] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
    [19] Clark K, Vendt B, Smith K, et al. The cancer imaging archive (TCIA): Maintaining and operating a public information repository. Journal of Digital Imaging, 2013, 26(6): 1045–1057.
    [20] Yang XY, He XH, Zhao JY, et al. COVID-CT-dataset: A CT scan dataset about COVID-19. arXiv:2003.13865, 2020.
    [21] Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 2021, 18(2): 203–211.
    [22] Yang AS, Xu LL, Qin N, et al. MFU-Net: A deep multimodal fusion network for breast cancer segmentation with dual-layer spectral detector CT. Applied Intelligence, 2024, 54(5): 3808–3824.
    [23] Zhou HY, Guo JS, Zhang YH, et al. nnFormer: Volumetric medical image segmentation via a 3D Transformer. IEEE Transactions on Image Processing, 2023, 32: 4036–4045.
    [24] Hatamizadeh A, Nath V, Tang YC, et al. Swin UNETR: Swin Transformers for semantic segmentation of brain tumors in MRI images. Proceedings of the 7th International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Springer, 2022. 272–284.
    [25] Mahmoudi R, Benameur N, Mabrouk R, et al. A deep learning-based diagnosis system for COVID-19 detection and pneumonia screening using CT imaging. Applied Sciences, 2022, 12(10): 4825.
    [26] Ma J, Wang Y, An X, et al. Toward data-efficient learning: A benchmark for COVID-19 CT lung and infection segmentation. Medical Physics, 2021, 48(3): 1197–1210.
    [27] Müller D, Soto-Rey I, Kramer F. Robust chest CT image segmentation of COVID-19 lung infection based on limited data. Informatics in Medicine Unlocked, 2021, 25: 100681.
    [28] Alirr OI. Automatic deep learning system for COVID-19 infection quantification in chest CT. Multimedia Tools and Applications, 2022, 81(1): 527–541.
    [29] Punn NS, Agarwal S. CHS-Net: A deep learning approach for hierarchical segmentation of COVID-19 via CT images. Neural Processing Letters, 2022, 54(5): 3771–3792.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

周迪,刘豪,程远志,李辉,刘晓亚.用于3D器官图像分割的波随机自注意力编码器.计算机系统应用,2025,34(2):84-91

复制
分享
文章指标
  • 点击次数:85
  • 下载次数: 333
  • HTML阅读次数: 86
  • 引用次数: 0
历史
  • 收稿日期:2024-07-13
  • 最后修改日期:2024-08-13
  • 在线发布日期: 2024-12-19
文章二维码
您是第11197720位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号