###
计算机系统应用英文版:2024,33(5):28-36
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
具有错误发现率控制的网络连接数据变量选择
(中国科学技术大学 管理学院 统计与金融系, 合肥 230026)
Variable Selection in Network-linked Data with FDR Control
(Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 279次   下载 719
Received:November 29, 2023    Revised:December 29, 2023
中文摘要: 网络连接数据的统计推断问题已成为近年来统计学研究的热点问题. 传统模型中样本数据间的独立性假设通常不能满足现代网络连接数据的分析需求. 本文研究了网络连接数据中每个节点的独立效应, 并借助融合惩罚的思想, 使得相互连接节点的独立效应趋同. 同时借助仿变量方法(Knockoff)仿冒原始变量的数据依赖结构、构造与目标变量无关的属性特征, 提出了针对网络连接数据进行变量选择的仿变量方法(NLKF). 从理论上证明了NLKF方法将变量选择的错误发现率(FDR)控制在目标水平. 对于原始数据协方差未知的情形, 使用估计的协方差矩阵仍具有上述良好的统计性质. 通过与传统变量选择方法Lasso对比, 说明了本文方法的可靠性. 最后结合因子投资领域2022年1–12月中国A股市场4000只股票的200个因子数据及每只股票所属申万一级行业构造的网络关系, 给出模型的应用实例.
Abstract:The statistical inference of network data has become a hot topic in statistical research in recent years. The independence assumption among sample data in traditional models often fails to meet the analytical demands of modern network-linked data. This work studies the independent effect of each network node in the network-linked data, and based on the idea of fusion penalty, the independent effect of the associated nodes is converged. Knockoff variables construct covariates independent of the target variable by imitating the structure of the original variable. With the help of Knockoff variables, this study proposes a general method framework for variable selection for network-linked data (NLKF). The study proves that NLKF can control the false discovery rate (FDR) at the target level and has higher statistical power than the Lasso variable selection method. When the covariance of the original data is unknown, the covariance matrix using the estimation still has good statistical properties. Finally, combining the 200 factor samples of more than 4 000 stocks in the A-share market and their network relations constructed by Shenyin Wanguo’s first-level industry classification, an example of the application in the field of financial engineering is given.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(12101584)
引用文本:
卢滢,李阳.具有错误发现率控制的网络连接数据变量选择.计算机系统应用,2024,33(5):28-36
LU Ying,LI Yang.Variable Selection in Network-linked Data with FDR Control.COMPUTER SYSTEMS APPLICATIONS,2024,33(5):28-36