基于图自编码器与LightGBM的癌症驱动基因识别系统
作者:
基金项目:

四川省教育信息化与大数据中心2022年度课题(DSJ2022214)


Identification System of Cancer Driver Genes Based on Graph Autoencoder and LightGBM
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [28]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    在癌症的形成和进展中, 癌症驱动基因扮演着重要角色. 准确识别癌症驱动基因有助于深入理解癌症的发生机制, 推动精准医学的发展. 针对当前癌症驱动基因识别领域所面临的异质性和复杂性问题, 本文设计并实现了一种基于图自编码器与LightGBM的癌症驱动基因识别系统ACGAI. 该系统首先以无监督的方式通过图自编码器学习生物分子网络的复杂拓扑结构, 随后将生成的嵌入表示与原始基因特征进行拼接, 形成基因增强特征并输入至LightGBM. 在经过训练后, 系统输出生物分子网络上每个基因的预测得分, 实现了对癌症驱动基因的准确识别. 最终, 该系统利用Web技术创建了一套用户友好、交互性强的可视化界面, 实现在基因集分析场景中的癌症驱动基因识别, 并为识别结果提供了生物学解释. 经过测试, 该系统表现出优于其他方法的识别性能, 能有效识别癌症驱动基因.

    Abstract:

    Cancer driver genes play a crucial role in the formation and progression of cancer. Accurate identification of cancer driver genes contributes to a deeper understanding of the mechanisms underlying cancer development and advances precision medicine. To address the heterogeneity and complexity challenges in the current field of cancer driver gene identification, this study presents the design and implementation of a cancer driver gene identification system, ACGAI, based on graph autoencoder and LightGBM. The system initially employs unsupervised learning with a graph autoencoder to grasp the complex topological structure of the biomolecular network. Subsequently, the generated embedding representations are concatenated with original gene features, forming gene-enhanced features input into LightGBM. After training, the system outputs predictive scores for each gene on the biomolecular network, achieving accurate identification of cancer driver genes. Finally, the system utilizes Web technology to create a user-friendly and highly interactive visualization interface, enabling cancer driver gene identification in the context of gene set analysis and providing biological interpretation for the identification results. Through rigorous testing, the system exhibits superior identification performance compared to other methods, demonstrating its effectiveness in identifying cancer driver genes.

    参考文献
    [1] Foulkes I, Sharpless NE. Cancer grand challenges: Embarking on a new era of discovery. Cancer Discovery, 2021, 11(1): 23–27.
    [2] Martínez-Jiménez F, Muiños F, Sentís I, et al. A compendium of mutational cancer driver genes. Nature Reviews Cancer, 2020, 20(10): 555–572.
    [3] Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes. Science, 2013, 339(6127): 1546–1558.
    [4] Ostroverkhova D, Przytycka TM, Panchenko AR. Cancer driver mutations: Predictions and reality. Trends in Molecular Medicine, 2023, 29(7): 554–566.
    [5] Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 2013, 499(7457): 214–218.
    [6] Yi HC, You ZH, Huang DS, et al. Graph representation learning in bioinformatics: Trends, methods and applications. Briefings in Bioinformatics, 2022, 23(1): bbab340.
    [7] 孟祥福, 田友发, 张霄雁. 基于LightGBM模型的肺腺癌免疫相关基因筛选与患者生存率预测. 生物医学工程学杂志, 2024, 41(1): 70–79.
    [8] Ke GL, Meng Q, Finley T, et al. LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 3149–3157.
    [9] Nulsen J, Misetic H, Yau C, et al. Pan-cancer detection of driver genes at the single-patient resolution. Genome Medicine, 2021, 13(1): 12.
    [10] Yang H, Liu YW, Yang YJ, et al. InDEP: An interpretable machine learning approach to predict cancer driver genes from multi-omics data. Briefings in Bioinformatics, 2023, 24(5): bbad318.
    [11] Yue X, Wang Z, Huang JG, et al. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics, 2020, 36(4): 1241–1251.
    [12] Zhang ZW, Cui P, Zhu WW. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 249–270.
    [13] Schulte-Sasse R, Budach S, Hnisz D, et al. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence, 2021, 3(6): 513–526.
    [14] 吴博, 梁循, 张树森, 等. 图神经网络前沿进展与应用. 计算机学报, 2022, 45(1): 35–68.
    [15] Peng W, Tang Q, Dai W, et al. Improving cancer driver gene identification using multi-task learning on graph convolutional network. Briefings in Bioinformatics, 2022, 23(1): bbab432.
    [16] Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016.
    [17] Zhang T, Zhang SW, Xie MY, et al. A novel heterophilic graph diffusion convolutional network for identifying cancer driver genes. Briefings in Bioinformatics, 2023, 24(3): bbad137.
    [18] Iván G, Grolmusz V. When the Web meets the cell: Using personalized PageRank for analyzing protein interaction networks. Bioinformatics, 2011, 27(3): 405–407.
    [19] Yang MJ, Wang HZ, Wei ZW, et al. Efficient algorithms for personalized pagerank computation: A survey. IEEE Transactions on Knowledge and Data Engineering.
    [20] Kanehisa M, Furumichi M, Sato Y, et al. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Research, 2023, 51(D1): D587–D592.
    [21] The Gene Ontology Consortium, Aleksander SA, Balhoff J, et al. The gene ontology knowledgebase in 2023. Genetics, 2023, 224(1): iyad031.
    [22] Fang ZQ, Liu XY, Peltz G. GSEApy: A comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics, 2023, 39(1): btac757.
    [23] Kuleshov MV, Jones MR, Rouillard AD, et al. Enrichr: A comprehensive gene set enrichment analysis Web server 2016 update. Nucleic Acids Research, 2016, 44(W1): W90–W97.
    [24] Xie ZR, Bailey A, Kuleshov MV, et al. Gene set knowledge discovery with enrichr. Current Protocols, 2021, 1(3): e90.
    [25] Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 2019, 47(D1): D607–D613.
    [26] Li JH, Liu S, Zhou H, et al. starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein—RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Research, 2014, 42(D1): D92–D97.
    [27] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32.
    [28] Khalsan M, Machado LR, Al-Shamery ES, et al. A survey of machine learning approaches applied to gene expression analysis for cancer prediction. IEEE Access, 2022, 10: 27522–27534.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

谢兵,苏波.基于图自编码器与LightGBM的癌症驱动基因识别系统.计算机系统应用,2024,33(10):87-96

复制
分享
文章指标
  • 点击次数:242
  • 下载次数: 1353
  • HTML阅读次数: 846
  • 引用次数: 0
历史
  • 收稿日期:2024-03-06
  • 最后修改日期:2024-05-06
  • 在线发布日期: 2024-08-28
文章二维码
您是第11371949位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号