本文已被:浏览 1742次 下载 3228次
Received:January 10, 2013 Revised:March 11, 2013
Received:January 10, 2013 Revised:March 11, 2013
中文摘要: 互k近邻MKnn算法是k-近邻算法的一种有效改进算法, 但其对类属性数据通常采用属性值相同为0, 不同为1的方法处理, 从而在类属性数据较多的数据集上分类效率受到一定程度的抑制. 针对MKnn对类属性数据处理方法的不足, 对类属性数据的处理引进类别基尼系数的概念, 对同类样本, 用基尼系数统计某一类属性中不同值分布对这个类的贡献度作为此类属性的权重, 并以此作为估算不同样本之间的相似性对MKnn进行优化, 扩宽MKnn的使用面. 实验结果验证了该方法的有效性.
Abstract:MKnn is an improved version of the k-nearest neighbor method, but it uses general approach to deal with nominal data, that is, if its value is the same then to 0, different to 1, thus the classification efficiency is suppressed a certain degree on the data sets with more nominal data. The concept of Category's Gini is introduced in this paper to deal with the shortage of the processing on nominal data, which statistics the contribution of samples in same class by its data distribution for its category and takes it as the attribute weight, used to estimate the similarity for different samples. It aims to optimize the MKnn method and promotes its applications. The experimental results demonstrate the effect-tiveness of the proposed method.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61070062);福建高校产学合作科技重大项目(2010H6007);福建省教育厅B类项目(JB12201)
引用文本:
陈雪云,郭躬德,陈黎飞,卢伟胜.GwMKnn:针对类属性数据加权的MKnn算法.计算机系统应用,2013,22(8):103-108,158
CHEN Xue-Yun,GUO Gong-De,CHEN Li-Fei,LU Wei-Sheng.GwMKnn:MKnn algorithm for Nominal Data by Gini Weight.COMPUTER SYSTEMS APPLICATIONS,2013,22(8):103-108,158
陈雪云,郭躬德,陈黎飞,卢伟胜.GwMKnn:针对类属性数据加权的MKnn算法.计算机系统应用,2013,22(8):103-108,158
CHEN Xue-Yun,GUO Gong-De,CHEN Li-Fei,LU Wei-Sheng.GwMKnn:MKnn algorithm for Nominal Data by Gini Weight.COMPUTER SYSTEMS APPLICATIONS,2013,22(8):103-108,158