本文已被:浏览 1104次 下载 2588次
Received:November 19, 2019 Revised:December 11, 2019
Received:November 19, 2019 Revised:December 11, 2019
中文摘要: 针对传统特征选择方法如信息增益存在选择偏好、处理非线性问题能力弱、以及参数手动优化过程繁琐的问题, 提出一种基于最大互信息系数与皮尔逊相关系数的两阶段特征选择融合算法, 并利用遗传算法对其中两个超参数自动进行优化. 第一阶段, 利用最大互信息系数获取特征和标签之间的相关性来进行特征选择; 第二阶段, 使用皮尔逊相关系数对获取的特征子集进行去冗余. 进一步, 基于遗传算法对两个阶段中的两个超参数自动进行优化. 将该方法运用于多组UCI数据集中进行测试. 实验结果表明, 该算法能够兼顾降低特征空间的维度和提升算法的分类性能.
Abstract:In view of traditional feature selection methods such as information gain algorithm have preference for selecting features that have more values, Pearson correlation coefficient alone cannot be used to deal with nonlinear correlation, and optimization of algorithm parameters is too tedious, a feature selection fusion approach is proposed based on maximum information coefficient and Pearson correlation coefficient. Moreover, this approach makes use of genetic algorithm to optimize parameters automatically. In the first stage, the feature selection is carried out according to the maximum information coefficient and the correlation between features and tags. In the second stage, Pearson correlation coefficient is used to reduce the redundant acquired features. Furthermore, two hyper-parameters in the first two stages are optimized automatically based on genetic algorithm. The experimental results show that the algorithm can reduce the dimension of feature space and improve the classification performance.
keywords: maximum information coefficient Pearson correlation coefficient feature selection genetic algorithm parameter optimization
文章编号: 中图分类号: 文献标志码:
基金项目:浙江省自然科学基金(LY17F030024); 浙江省公益技术研究项目(GG20F030031)
引用文本:
吴俊,柯飂挺,任佳.参数自动优化的特征选择融合算法.计算机系统应用,2020,29(7):145-151
WU Jun,KE Liu-Ting,REN Jia.Parameter Automatic Optimization for Feature Selection Fusion Algorithm.COMPUTER SYSTEMS APPLICATIONS,2020,29(7):145-151
吴俊,柯飂挺,任佳.参数自动优化的特征选择融合算法.计算机系统应用,2020,29(7):145-151
WU Jun,KE Liu-Ting,REN Jia.Parameter Automatic Optimization for Feature Selection Fusion Algorithm.COMPUTER SYSTEMS APPLICATIONS,2020,29(7):145-151