﻿ 实例加权类依赖Relief
 计算机系统应用  2019, Vol. 28 Issue (7): 121-126 PDF

Instance Weighted Class Dependent Relief
QIU Hai-Feng, HE Zhen-Feng
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Foundation item: Natural Science Foundation of Fujian Province (2018J01794)
Abstract: The Relief algorithm is a filtering feature selection algorithm that maximizes the instance margins in the nearest neighbor classifier in a greedy manner. Combined with the local weight method, the authors proposed a Class Dependent RELIEF (CDRELIEF) algorithm that trains one feature weight for each category. This method can better reflect the correlation of features. However, feature weight vector are only effective for measuring the correlation of features to a certain class, and classifying them in actual classification. In the actual classification, the classification accuracy is not high enough. In order to apply the CDRELIEF algorithm to the classification process, this study changes the weight update process, and assigns an instance weight to each instance in the training set. By combining the instance weight value into the weight updating formula, the influence of data points far from the classification boundary and outliers on weight updating is excluded, thereby improving the classification accuracy. The Instance Weighted CDRELIEF (IWCDRELIEF) algorithm proposed in this study is compared with CDRELIEF algorithm on multiple UCI 2-class datasets. Experimental results show that the algorithm proposed in this study has significantly improved the CDRELIEF algorithm.
Key words: Relief algorithm     feature weighting     instance weighting     local weight     classification

1 引言

Relief是一种广泛应用的过滤型方法, 在文献[1]中首次被提出用于二类数据的特征选择, 虽然Relief算法比较简单, 运行效率高, 并且结果也比较令人满意, 但是其局限性在于只能处理二类数据, Kononenko将其扩展到多类情况, 提出ReliefF算法, 并在文献[2]中对ReliefF算法做了深入探讨. 虽然Relief已经得到较广泛的应用, 但它依然存在一些不足之处[3], 例如, 该类算法的数学形式依然没有得到很好的定义, 故它的特点和性质还难以得到深入的研究, 此外, 它依然缺乏强大的处理异常数据点的机制, 以及需要提高在噪音环境下的鲁棒性. 目前, 已有许多改进Relief算法的文献, 如迭代Relief算法I-RELIEF[4], I-RELIEF算法基于间隔最大化构造优化目标函数, 并以类EM算法的迭代策略来导出权重向量的学习规则. 另外, 文献[5]中提出了类依赖特征权重Relief算法, 由于不同类别数据点的各个特征重要性可能存在很大不同, 类依赖特征权重Relief算法为每个类别数据点单独训练一个权重, 以克服使用全局权重时不同类别数据点间特征重要性不同带来的影响.

2 Relief和类依赖Relief

Relief算法中使用全局权重, 但是因为全局距离度量使用的特征权重没有区别不同的类别, 所以当一些特征对于不同的类表现得不同时会导致分类性能不佳. 相比全局权重, 局部特征权重更能反映不同类中相同特征的不同重要性, 因此, CDRELIEF通过学习局部权重来提高权重关于类别的相关性, 目前, 已有许多方法[9,10]用于在局部区域上学习距离度量, 也有局部和全局相结合的距离度量[11]. 对于不同的类别来说特征权重是不一样的. 最有代表性的方法是类依赖加权距离度量(CDW), 该距离与原型的类标签相关:

 ${d_{CDW}}(x,y) = \sqrt {\sum\limits_{j = 1}^D {w_{c,j}^2{{({x_j} - {y_j})}^2}} }$ (1)

2.1 Relief

Relief特征加权[1]的核心思想是根据每一个特征区分不同类实例的能力来估计特征权值及其重要性, 给定一个包含N个实例的二类数据集X, C是类标签集合, xX中的一个实例, 每个实例 $x = ({x_1},{x_2}, \cdots, {x_D})$ 是一个维度为D的实值向量. Relief进行如下迭代学习: 随机的选取一个实例x, 然后寻找同类最近实例 $NH\left( x \right)$ 和异类最近实例 $NM\left( x \right)$ , 接着利用如下规则更新权值:

 ${w_j} = {w_j} + \frac{1}{T} \cdot |{x_j} - NM{(x)_j}| - \frac{1}{T}|{x_j} - NH{(x)_j}|$ (2)

① 给定一个包含N个实例和D个特征的二类数据集X, 设置初始权值 ${w_j} = 0\;(1 \le j \le D)$ 以及最大迭代次数T, 并且设置迭代初始值t=1.

② 从数据集X中随机选取一个实例x并计算该实例的同类最近实例 $NH\left( x \right)$ 和异类最近实例 $NM\left( x \right)$ .

③ 对于每一维权值, 利用式(2)更新权值.

④ 若t=T, 算法结束, 否则t=t+1返回步骤②.

⑤ 输出更新以后的权值向量w.

 \begin{aligned} {w_j} = & {w_j} + \sum\nolimits_{z \in KNN(x,l),l \ne c} {|{x_j} - {z_j}|/T } \\ & - {\sum\nolimits_{z \in KNN(x,c)} {|{x_j} - {z_j}|} } /T \end{aligned} (3)

$KNN(x,c)$ 是x在 ${X_c}$ 中通过欧氏距离求得的K个最近邻居的集合.

2.2 类依赖Relief

Elena Marchiori[5]研究将Relief分解为类依赖特征权重, 并表示使用全局特征权重时将同一特征在不同类中的权重相加会抵消彼此关于单个类别的相关性, 导致特征关于单个类别的相关性可能不会被检测到, 因此他们提出将原来的所有数据共用一个特征权重改为一个类别一个特征权重, 类c的特征权重为 ${w_c}$ , 这样可以保留特征关于单个类别的相关性. 计算类别权重 ${w_c}$ 时只选取类别为c的实例x, 然后找该实例邻居, 对类别权重进行更新. 权重更新公式为:

 ${w_c} = \sum\nolimits_{x \in {X_c}} {\sum\nolimits_{z \in KNN(x,c)} { - |x - z|} /T + \sum\nolimits_{z \in \mathop {KNN}\limits_{l \ne c} (x,l)} {|x - z|} } /T$ (4)

${w_c}$ 被看做类别c的特征权重, ${X_c}$ 是类别为c的数据点集合, $KNN\left( {x,c} \right)$ x的同标签k近邻, $\mathop {KNN}\limits_{l \ne c} (x,l)$ x的标签不为ck近邻. 根据式(4)可以为数据集中每个类别数据求得一个特征权重.

3 实例加权类依赖Relief

3.1 实例权重

 $IW(x) = \left\{ {\begin{array}{*{20}{c}} {1,\;\quad {\rm{if}}(\;\min ({d_1}/{d_2},{d_2}/{d_1})\; > threshold)}\\ {0,\,\quad {\rm{if}}(\;\min ({d_1}/{d_2},{d_2}/{d_1})\; < threshold)} \end{array}} \right.$ (5)

3.2 新的特征权重更新公式

Step1. 为每个类别的特征权重设置初始权值 ${w_{c,j}} = 0(c \in C,1 \le j \le D)$ .

Step2. 从集合C中取出一个类标签c.

Step3. 从数据集X中随机选取一个类别为c的实例x. 根据如下过程更新权重:

Step3.1. 找出xk个同类最近邻居集合 $KNN(x,c)$ , 还有k个异类最近邻居集合 $\mathop {KNN}\limits_{l \ne c} (x,l)$ , 以及到 $KNN(x,c)$ k个点的距离之和d1. 到 $\mathop {KNN}\limits_{l \ne c} (x,l)$ k个点距离之和d2.

Step3.2. 将d1, d2代入式(5)计算x的实例权重 $IW(x)$ .

Step3.3. cx的类标签, l为不同于c的类标签, 即集合C中的另一个类. 对两个类别的特征权重 ${w_{c,j}}(j \in D)$ , ${w_{l,j}}(j \in D)$ 进行更新:

 $\begin{array}{l} {\rm{If}}\;\;\;\sum\nolimits_{z \in KNN(x,l),l \ne c} {||x - z|| - \sum\nolimits_{z \in KNN(x,c)} {||x - z|| > 0} } \;:\\ \;\;\;{w_{c,j}} = {w_{c,j}} + IW(x)*(\sum\nolimits_{z \in KNN(x,l),l \ne c} {|{x_j} - {z_j}} | - \sum\nolimits_{z \in KNN(x,c),} {|{x_j} - {z_j}} |)/(T*k)\\ {\rm{else}}:\\ \;\;\;{w_{l,j}} = {w_{l,j}} + IW(x)*(\sum\nolimits_{z \in KNN(x,l),l \ne c} {|{x_j} - {z_j}} | - \sum\nolimits_{z \in KNN(x,c),} {|{x_j} - {z_j}} |)/(T*k) \end{array}$

||x–z||表示点x和点z的欧式距离.

Step4. t=T, 则执行Step5, t<T则返回Step3.

Step5. 若C中所有值都取出, 算法结束, 输出 ${w_c}(c \in C)$ 否则返回Step2.

4 实验与分析

 图 1 CDRELIEF和IWCDRELIEF对实验数据集在不同k值下分类准确率的对比(%)

5 结语

 [1] Kira K, Rendell LA. A practical approach to feature selection. Proceedings of the 9th International Workshop on Machine Learning. Aberdeen, Scotland, UK, 1992, 249-256. [2] Robnik-Šikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 2003, 53(1-2): 23-69. [3] Urbanowicz RJ, Meeker M, La Cava W, et al. Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 2018, 85: 189-203. DOI:10.1016/j.jbi.2018.07.014 [4] Sun YJ. Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligenc, 2007, 29(6): 1035-1051. DOI:10.1109/TPAMI.2007.1093 [5] Marchiori E. Class dependent feature weighting and K-nearest neighbor classification. Proceedings of the 8th IAPR International Conference on Pattern Recognition in Bioinformatics. Nice, France, 2013, 69-78. [6] Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N. Simultaneous instance and feature selection and weighting using evolutionary computation: Proposal and study. Applied Soft Computing, 2015, 37: 416-443. DOI:10.1016/j.asoc.2015.07.046 [7] Shu WH, Shen H. Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recognition, 2014, 47(12): 3890-3906. DOI:10.1016/j.patcog.2014.06.002 [8] de Haro-García A, Pérez-Rodríguez J, García-Pedrajas N. Combining three strategies for evolutionary instance selection for instance-based learning. Swarm and Evolutionary Computation, 2018, 42: 160-172. DOI:10.1016/j.swevo.2018.02.022 [9] Zhang H, Yu J, Wang M, et al. Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing, 2012, 93: 100-105. DOI:10.1016/j.neucom.2012.03.007 [10] Jiao LM, Pan Q, Feng XX, et al. An evidential K-nearest neighbor classification method with weighted attributes. Proceedings of the 16th International Conference on Information Fusion. Istanbul, Turkey, 2013, 145-150. [11] Wang W, Hu BG, Wang ZF. Globality and locality incorporation in distance metric learning. Neurocomputing, 2014, 129: 185-198. DOI:10.1016/j.neucom.2013.09.041