Abstract:Many methods for gene biomarker selection can not be directly used in clinical diagnosis because of a small number of research samples. Therefore, some scholars proposed methods of integrating different gene expression data while preserving the integrity of biological information. However, due to the batch effect, direct integration of different gene expression data may bring new systematic errors. In response to the above problems, an analysis framework integrating self-paced learning and SCAD-Net regularization is proposed. On the one hand, self-paced learning can learn the basic model from low-noise samples and then make the model more robust through high-noise samples to avoid batch effect. On the other hand, SCAD-Net regularization combines biological interaction information and gene expression data, which can achieve a better performance in feature selection. The simulation data in different cases and the results on the breast cancer cell line dataset show that the regression model based on self-paced learning and SCAD-Net regularization obtains better prediction results when dealing with high-dimensional complex network datasets.