Identification System of Cancer Driver Genes Based on Graph Autoencoder and LightGBM
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [28]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Cancer driver genes play a crucial role in the formation and progression of cancer. Accurate identification of cancer driver genes contributes to a deeper understanding of the mechanisms underlying cancer development and advances precision medicine. To address the heterogeneity and complexity challenges in the current field of cancer driver gene identification, this study presents the design and implementation of a cancer driver gene identification system, ACGAI, based on graph autoencoder and LightGBM. The system initially employs unsupervised learning with a graph autoencoder to grasp the complex topological structure of the biomolecular network. Subsequently, the generated embedding representations are concatenated with original gene features, forming gene-enhanced features input into LightGBM. After training, the system outputs predictive scores for each gene on the biomolecular network, achieving accurate identification of cancer driver genes. Finally, the system utilizes Web technology to create a user-friendly and highly interactive visualization interface, enabling cancer driver gene identification in the context of gene set analysis and providing biological interpretation for the identification results. Through rigorous testing, the system exhibits superior identification performance compared to other methods, demonstrating its effectiveness in identifying cancer driver genes.

    Reference
    [1] Foulkes I, Sharpless NE. Cancer grand challenges: Embarking on a new era of discovery. Cancer Discovery, 2021, 11(1): 23–27.
    [2] Martínez-Jiménez F, Muiños F, Sentís I, et al. A compendium of mutational cancer driver genes. Nature Reviews Cancer, 2020, 20(10): 555–572.
    [3] Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes. Science, 2013, 339(6127): 1546–1558.
    [4] Ostroverkhova D, Przytycka TM, Panchenko AR. Cancer driver mutations: Predictions and reality. Trends in Molecular Medicine, 2023, 29(7): 554–566.
    [5] Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 2013, 499(7457): 214–218.
    [6] Yi HC, You ZH, Huang DS, et al. Graph representation learning in bioinformatics: Trends, methods and applications. Briefings in Bioinformatics, 2022, 23(1): bbab340.
    [7] 孟祥福, 田友发, 张霄雁. 基于LightGBM模型的肺腺癌免疫相关基因筛选与患者生存率预测. 生物医学工程学杂志, 2024, 41(1): 70–79.
    [8] Ke GL, Meng Q, Finley T, et al. LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 3149–3157.
    [9] Nulsen J, Misetic H, Yau C, et al. Pan-cancer detection of driver genes at the single-patient resolution. Genome Medicine, 2021, 13(1): 12.
    [10] Yang H, Liu YW, Yang YJ, et al. InDEP: An interpretable machine learning approach to predict cancer driver genes from multi-omics data. Briefings in Bioinformatics, 2023, 24(5): bbad318.
    [11] Yue X, Wang Z, Huang JG, et al. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics, 2020, 36(4): 1241–1251.
    [12] Zhang ZW, Cui P, Zhu WW. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 249–270.
    [13] Schulte-Sasse R, Budach S, Hnisz D, et al. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence, 2021, 3(6): 513–526.
    [14] 吴博, 梁循, 张树森, 等. 图神经网络前沿进展与应用. 计算机学报, 2022, 45(1): 35–68.
    [15] Peng W, Tang Q, Dai W, et al. Improving cancer driver gene identification using multi-task learning on graph convolutional network. Briefings in Bioinformatics, 2022, 23(1): bbab432.
    [16] Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016.
    [17] Zhang T, Zhang SW, Xie MY, et al. A novel heterophilic graph diffusion convolutional network for identifying cancer driver genes. Briefings in Bioinformatics, 2023, 24(3): bbad137.
    [18] Iván G, Grolmusz V. When the Web meets the cell: Using personalized PageRank for analyzing protein interaction networks. Bioinformatics, 2011, 27(3): 405–407.
    [19] Yang MJ, Wang HZ, Wei ZW, et al. Efficient algorithms for personalized pagerank computation: A survey. IEEE Transactions on Knowledge and Data Engineering.
    [20] Kanehisa M, Furumichi M, Sato Y, et al. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Research, 2023, 51(D1): D587–D592.
    [21] The Gene Ontology Consortium, Aleksander SA, Balhoff J, et al. The gene ontology knowledgebase in 2023. Genetics, 2023, 224(1): iyad031.
    [22] Fang ZQ, Liu XY, Peltz G. GSEApy: A comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics, 2023, 39(1): btac757.
    [23] Kuleshov MV, Jones MR, Rouillard AD, et al. Enrichr: A comprehensive gene set enrichment analysis Web server 2016 update. Nucleic Acids Research, 2016, 44(W1): W90–W97.
    [24] Xie ZR, Bailey A, Kuleshov MV, et al. Gene set knowledge discovery with enrichr. Current Protocols, 2021, 1(3): e90.
    [25] Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 2019, 47(D1): D607–D613.
    [26] Li JH, Liu S, Zhou H, et al. starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein—RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Research, 2014, 42(D1): D92–D97.
    [27] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32.
    [28] Khalsan M, Machado LR, Al-Shamery ES, et al. A survey of machine learning approaches applied to gene expression analysis for cancer prediction. IEEE Access, 2022, 10: 27522–27534.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

谢兵,苏波.基于图自编码器与LightGBM的癌症驱动基因识别系统.计算机系统应用,2024,33(10):87-96

Copy
Share
Article Metrics
  • Abstract:171
  • PDF: 1248
  • HTML: 700
  • Cited by: 0
History
  • Received:March 06,2024
  • Revised:May 06,2024
  • Online: August 28,2024
Article QR Code
You are the first987774Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063