基于互信息的组合特征选择算法
作者:
基金项目:

国家自然科学基金(11401115)


Combined Feature Selection Algorithm Based on Mutual Information
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [14]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    对候选特征进行降维在机器学习领域,如分类、聚类问题中占有很重要的地位.现有的方法大多数是基于单一特征对目标T的依赖性或特征与特征之间对Y影响的关联性,互补性和冗余性进行特征选择.然而这些方法几乎都没有考虑到组合特征,如属性A,B仅包含Y中的极少量信息,甚至与Y完全独立,但A&B能提供关于Y的大量信息,甚至完全决定Y.基于此,提出了一种能够从特征集合中挖掘到组合特征与单一特征的特征选择算法,首先对不显著特征进行组合并按照条件概率分布表生成新的候选特征;然后,对单一特征和组合特征利用基于最大相关性和最小冗余度的准则进行选择.最后分别在虚拟和真实数据集上进行实验,实验结果表明该特征选择算法能够较好的挖掘数据集的组合特征信息,一定程度上提高了相应的机器学习算法的准确率.

    Abstract:

    It is very important to reduce the candidate features in the machine learning such as classification and clustering. Most of the existing methods are based on a single feature on the target T or the association between the feature and the feature on the Y. However, these methods do not take into the combined features, such as attributes A, B contains a little amount of information in Y, and even completely independent of Y, but A & B can provide information on Y lot of information, or even completely determine the Y. Based on this, we can extract an algorithm to find single and combined features from the feature set, firstly combination of non-significant features in accordance with the conditional probability distribution table to generate new candidate features Then, the single feature and the combined features are chosen based on the criterion of the maximum correlation and the minimum redundancy. Finally, the experiment is carried out on the virtual and real data sets respectively, and the experimental results show that the feature selection algorithm can mine the dataset better, Which improves the accuracy of the corresponding machine learning algorithm to a certain extent.

    参考文献
    1 Cao LJ, Chua KS, Chong WK, et al. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 2003, 55(1-2):321-336.[DOI:10.1016/S0925-2312(03)00433-8]
    2 Peng HC, Long FH, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005, 27(8):1226-1238.[DOI:10.1109/TPAMI.2005.159]
    3 Maji P, Garai P. On fuzzy-rough attribute selection:Criteria of max-dependency, max-relevance, min-redundancy, and max-significance. Applied Soft Computing, 2013, 13(9):3968-3980.[DOI:10.1016/j.asoc.2012.09.006]
    4 Cai R C, Hao ZF, Yang XW, et al. An efficient gene selection algorithm based on mutual information. Neurocomputing, 2009, 72(4-6):991-999.[DOI:10.1016/j.neucom.2008.04.005]
    5 Maji P, Paul S. Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning, 2011, 52(3):408-426.[DOI:10.1016/j.ijar.2010.09.006]
    6 Aliferis CF, Statnikov A, Tsamardinos I, et al. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I:Algorithms and empirical evaluation. The Journal of Machine Learning Research, 2010, 11:171-234.
    7 陈一明. 一种基于因果网络的支持向量回归特征选择算法. 湖南师范大学自然科学学报, 2015, 38(4):90-94.
    8 Li XJ, Mishra SK, Wu M, et al. Syn-lethality:An Integrative knowledge base of synthetic lethality towards discovery of selective anticancer therapies. BioMed Research International, 2014, 2014:196034.
    9 Wu M, Li XJ, Zhang F, et al. In silico prediction of synthetic lethality by meta-analysis of genetic interactions, functions, and pathways in yeast and human cancer. Cancer Informatics, 2014, 13(S3):71-80.
    10 Peters J, Janzing D, SCHOLKOPF B. Causal inference on discrete data using additive noise models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011, 33(12):2436-2450.[DOI:10.1109/TPAMI.2011.71]
    11 Chen WQ, Hao ZF, Cai RC, et al. Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction. Soft Computing, 2016, 20(11):4575-4588.[DOI:10.1007/s00500-015-1764-8]
    12 UCI机器学习数据集数据库. http://archive.ics.uci.edu/ml/d-atasets.html.
    13 张亮, 宁芊. CART决策树的两种改进及应用. 计算机工程与设计, 2015, 36(5):1209-1213.
    14 罗来平, 宫辉力, 刘先林. 基于决策树算法的遥感图像分类研究与实现. 计算机应用研究, 2007, 24(1):207-209.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李叶紫,周怡璐,王振友.基于互信息的组合特征选择算法.计算机系统应用,2017,26(8):173-179

复制
分享
文章指标
  • 点击次数:1514
  • 下载次数: 4688
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2016-12-05
  • 在线发布日期: 2017-10-31
文章二维码
您是第12828882位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号