本文已被:浏览 80次 下载 937次
Received:January 25, 2024 Revised:February 29, 2024
Received:January 25, 2024 Revised:February 29, 2024
中文摘要: 部分线性模型作为一种重要的半参数回归模型, 因其在复杂数据结构分析中表现出的灵活适应性, 广泛应用于各领域. 然而, 在大数据背景下, 该模型的研究和应用面临着多重挑战, 其中最为关键的难点在于计算速度和数据存储. 本文针对以数据块形式连续观测的数据流场景, 提出一种在线估计的计算方法, 用于估计部分线性模型中线性部分的参数和非线性部分的未知函数. 该方法仅需利用当前数据块和之前计算过的汇总统计量即可实现实时估算. 数值模拟从两个角度进行验证有效性: 分别改变数据流的单位数据块大小和总样本规模, 以比较在线估计方法和传统估计方法的偏差、标准误差以及均方误差. 实验表明, 与传统方法相比, 本文的方法具有快速计算和无需重新访问历史数据的优势, 同时在均方误差方面接近传统方法. 最后, 基于中国综合社会调查 (CGSS) 数据, 本文应用在线估计方法分析我国劳动年龄人口生活质量的影响因素, 得出周工作时间在30–60 h范围内的全职工作对提升生活质量具有积极作用的结论, 为相关政策制定提供了一定参考价值.
Abstract:The partially linear model, as an important type of semiparametric regression models, is widely used across various fields due to its flexible adaptability in the analysis of complex data structures. However, in the era of big data, the research and application of this model are faced with multiple challenges, with the most critical ones being computing speed and data storage. This study considers the scenario of data streams continuously observed in the form of data blocks and proposes an online estimation method for the parameters of the linear part and the unknown function of the nonlinear part in the partially linear model. This method enables real-time estimation using only the current data block and previously computed summary statistics. To verify the effectiveness, the unit data block size and the total sample size of the data streams are changed respectively in numerical simulations, so that the bias, standard error and mean squared error between the online estimation method and the traditional one can be compared. The experiments demonstrate that, compared to the traditional method, the proposed approach offers the advantages of rapid computation and unnecessary review of historical data, while being close to the traditional method in terms of mean squared error. Finally, based on the data from the China general social survey (CGSS), this study applies the online estimation method to analyze the factors influencing the quality of life of the working-age population in China. The results indicate that full-time work within the range of 30 to 60 hours per week positively contributes to improving the quality of life, providing valuable references for relevant policy formulation.
文章编号: 中图分类号: 文献标志码:
基金项目:
Author Name | Affiliation | |
LU Guo-Lin | School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, China | luguolin@mail.ustc.edu.cn |
Author Name | Affiliation | |
LU Guo-Lin | School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, China | luguolin@mail.ustc.edu.cn |
引用文本:
卢果林.数据流下部分线性模型的在线估计.计算机系统应用,2024,33(10):152-162
LU Guo-Lin.Online Estimation for Partially Linear Model in Data Streams.COMPUTER SYSTEMS APPLICATIONS,2024,33(10):152-162
卢果林.数据流下部分线性模型的在线估计.计算机系统应用,2024,33(10):152-162
LU Guo-Lin.Online Estimation for Partially Linear Model in Data Streams.COMPUTER SYSTEMS APPLICATIONS,2024,33(10):152-162