基于互信息的组合特征选择算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(11401115)


Combined Feature Selection Algorithm Based on Mutual Information
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    对候选特征进行降维在机器学习领域,如分类、聚类问题中占有很重要的地位.现有的方法大多数是基于单一特征对目标T的依赖性或特征与特征之间对Y影响的关联性,互补性和冗余性进行特征选择.然而这些方法几乎都没有考虑到组合特征,如属性A,B仅包含Y中的极少量信息,甚至与Y完全独立,但A&B能提供关于Y的大量信息,甚至完全决定Y.基于此,提出了一种能够从特征集合中挖掘到组合特征与单一特征的特征选择算法,首先对不显著特征进行组合并按照条件概率分布表生成新的候选特征;然后,对单一特征和组合特征利用基于最大相关性和最小冗余度的准则进行选择.最后分别在虚拟和真实数据集上进行实验,实验结果表明该特征选择算法能够较好的挖掘数据集的组合特征信息,一定程度上提高了相应的机器学习算法的准确率.

    Abstract:

    It is very important to reduce the candidate features in the machine learning such as classification and clustering. Most of the existing methods are based on a single feature on the target T or the association between the feature and the feature on the Y. However, these methods do not take into the combined features, such as attributes A, B contains a little amount of information in Y, and even completely independent of Y, but A & B can provide information on Y lot of information, or even completely determine the Y. Based on this, we can extract an algorithm to find single and combined features from the feature set, firstly combination of non-significant features in accordance with the conditional probability distribution table to generate new candidate features Then, the single feature and the combined features are chosen based on the criterion of the maximum correlation and the minimum redundancy. Finally, the experiment is carried out on the virtual and real data sets respectively, and the experimental results show that the feature selection algorithm can mine the dataset better, Which improves the accuracy of the corresponding machine learning algorithm to a certain extent.

    参考文献
    相似文献
    引证文献
引用本文

李叶紫,周怡璐,王振友.基于互信息的组合特征选择算法.计算机系统应用,2017,26(8):173-179

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-12-05
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2017-10-31
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号