Abstract:With the development of science and technology, the application of network-linked data in statistical learning, machine learning and other fields becomes increasingly common. In linear regression models, the current research on the variable selection of network-linked data mainly focuses on the homogeneous samples, namely that the individual effects of the samples are the same. In reality, however, the individual effects of most samples are heterogeneous. As a result, the neglect of the heterogeneity will lead to large deviations in the estimation and prediction of the models. Therefore, this paper proposes a new variable selection method SNC to cope with the situation when there is group heterogeneity in network-linked data. Using the network agglomeration effect, we carry out a joint penalty for the difference between the variable coefficient and the individual effect of the connected samples and solve the problem with ADMM algorithm, with the convergence of the algorithm proved. The results of numerical simulation and example analysis show that this method improves the accuracy of variable selection and reduces the prediction error.