﻿ 基于教与学优化改进的近邻传播聚类算法
 计算机系统应用  2020, Vol. 29 Issue (5): 220-225 PDF

Affinity Propagation Clustering Based on Teaching Learning-Based Optimization
MA Pian-Pian, ZHANG Xin-Gang, LIANG Jing-Jing
School of Computer and Information Technology, Nanyang Normal University, Nanyang 473061, China
Foundation item: Science and Technology Program of Henan Province (182102210114); Science and Technology Project of Nanyang Normal University (2019QN020)
Abstract: Aiming at the limitation of the clustering effect caused by the preference and damping factors in Affinity Propagation (AP), a Teaching and Learning Based Optimization (TLBO) algorithm is proposed. First, the search space of parameter p is determined, and then the TLBO algorithm is used to find the optimal parameter value in the search space. At the same time, the damping factor is automatically adjusted to prevent numerical oscillations during the clustering process, so as to improve the clustering quality of AP algorithm. The experimental results show that the algorithm can effectively solve the problem caused by preference and damping factors, improve the contour coefficient of clustering, and reduce the clustering error rate.
Key words: Affinity Propagation (AP)     preference     teaching learning     search space     adaptive

1 教与学的自适应近邻传播聚类 1.1 教与学优化算法

1.1.1 教师阶段

 $Difm = {{r}}\left( {{X_{\rm teacher}} - {T_F}\cdot {X_{\rm mean}}} \right)$ (1)
 ${X_{{\rm new},i}} = \left\{ {\begin{array}{*{20}{l}} {{X_{{\rm old},i}} + Difm,\;\;{\rm{if }}\;{{{F}}_i}{\rm{(}}{{{X}}_{{\rm{old}},i}}{\rm{) < }}{{{F}}_{{i}}}({X_{{\rm{new}},i}})} \\ {{X_{{\rm{old}},i}},\;\;{\rm otherwise}} \end{array}} \right.{\kern 1pt}$ (2)

1.1.2 学生阶段

 ${X_{{\rm{new}},i}} = {X_i} + r\left( {{{{X}}_{{i}}} - {{{X}}_{{j}}}} \right),\;\;{\rm{if}}\;{{F}}({{{X}}_{{i}}}) \ge {{F}}({{{X}}_j})$ (3)
 ${X_{{\rm{new}},{{i}}}} = {X_i} + r\left( {{{{X}}_j} - {{{X}}_i}} \right),\;\;{\rm{if}}\;{{F}}({{{X}}_{{i}}}) \le {{F}}({{{X}}_j})$ (4)
1.2 近邻传播聚类算法

 $F(x,c) = \sum\limits_{i = 1}^N {\min \left\{ {\left. {{{\left\| {{x_i} - {c_i}} \right\|}^2}} \right\}} \right.}$ (5)

 $r(i,k) \leftarrow s(i,k) - \mathop {\max }\limits_{{k'} \ne k} \left\{ {a(i,{k'}) + s(i,{k'})} \right\}$ (6)

 $r(k,k) \leftarrow s(k,k) - \mathop {\max }\limits_{{k'} \ne k} \left\{ {a(i,{k'}) + s(i,{k'})} \right\}$ (7)
 $a(i,k) \doteq \min \left\{ {0,r\left( {k,k} \right) + \sum\limits_{{i'} \notin \{ i,k\} } {\max \{ 0,r({i'},k)\} } } \right\}$ (8)

 $a(k,k) \leftarrow \sum {\max \left\{ {0,r(i,k)} \right\}}$ (9)

 $\left\{ {\begin{array}{*{20}{c}} {{R_{{\rm{new}}}} = \lambda *{R_{{\rm{old}}}} + (1 - \lambda )*{R_{{\rm{new}}}}}\\ {{A_{{\rm{new}}}} = \lambda *{A_{{\rm{old}}}} + (1 - \lambda )*{A_{{\rm{new}}}}} \end{array}} \right.$ (10)
1.3 基于教与学优化的近邻传播聚类

1) 加载数据集.

2) 对数据集进行归一化处理. 算法种群规模为n, 对应数据集中样本个数, 决策变量为偏向参数p, 最大迭代次数为Tmax.

3) 采用负的欧式距离建立相似度矩阵S.

4) 根据式(6)~(9)执行近邻传播聚类, 同时根据聚类目标函数(5)计算目标函数值.

5) 找出目标函数值最大的值作为群体内的教师Xteacher, 计算群体平

6) 教师阶段: 按照式(1)和式(2)产生新个体, 如果新个体的目标函数值优于原有个体, 则保留新个体的值.

7) 学生阶段: 个体根据式(3)和式(4)产生新个体, 如果新个体的目标函数值优于原有个体, 则保留新个体的值.

8) 循环执行直到最大迭代次数为Tmax, 找出最小目标函数值下的对应参数.

9) 找出簇中心并进行分配, 输出最终的簇.

 $dpsim1 = \max \left\{ {\sum\limits_j {s(i,j)} } \right\}$ (11)

 $dpsim2 = \mathop {\max }\limits_{i \ne j} \left\{ {\sum\limits_k {\max \{ s\left( {i,k} \right),s(j,k)\} } } \right\}$ (12)

2 实验与分析

1)准确率(ACC)

 $ACC = \sum\limits_{i = 1}^K {\frac{{{L_i}}}{{NU{M_i}}}}$ (13)

2)正则化互信息(NMI)

 $NMI = \frac{{\displaystyle\sum\limits_{i = 1}^K {\sum\limits_{j = 1}^K {{N_{{{i,j}}}}\log \frac{{N{N_{{{i,j}}}}}}{{{N_{{i}}}{N_{\rm{j}}}}}} } }}{{\sqrt {\displaystyle\sum\limits_{i = 1}^K {{N_{\rm{i}}}\log \frac{{{N_{{i}}}}}{N}\displaystyle\sum\limits_{j = 1}^K {{N_j}\log \frac{{{N_{{j}}}}}{N}} } } }}$ (14)

3) Rand指数(RI)

Rand指数是聚类性能度量中将聚类结果与外部“某个参考模型”相对比的外部指标.

 $RI = \frac{{a + {{d}}}}{{a + b + c + d}}$ (15)

3 总结

 [1] Frey BJ, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976. DOI:10.1126/science.1136800 [2] Zhang XL, Furtlehner C, Germain-Renaud C, et al. Data stream clustering with affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(7): 1644-1656. DOI:10.1109/TKDE.2013.146 [3] Devika G, Parthasarathy S. Fuzzy statistics-based affinity propagation technique for clustering in satellite cloud image. European Journal of Remote Sensing, 2018, 51(1): 754-764. DOI:10.1080/22797254.2018.1482731 [4] Guan RC, Shi XH, Marchese M, et al. Text clustering with seeds affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(4): 627-637. DOI:10.1109/TKDE.2010.144 [5] Bodenhofer U, Kothmeier A, Hochreiter S. APCluster: An R package for affinity propagation clustering. Bioinformatics, 2011, 27(17): 2463-2464. DOI:10.1093/bioinformatics/btr406 [6] Xiao Y, Yu J. Semi-supervised clustering based on affinity propagation algorithm. Journal of Software, 2008, 19(11): 2803-2813. [7] Arzeno NM, Vikalo H. Semi-supervised affinity propagation with soft instance-level constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(5): 1041-1052. DOI:10.1109/TPAMI.2014.2359454 [8] 谢文斌, 童楠, 王忠秋, 等. 基于粒子群的近邻传播算法. 计算机系统应用, 2014, 23(3): 103-107, 76. DOI:10.3969/j.issn.1003-3254.2014.03.017 [9] Jia B, Yu B, Wu Q, et al. Adaptive affinity propagation method based on improved cuckoo search. Knowledge-Based Systems, 2016, 111: 27-35. DOI:10.1016/j.knosys.2016.07.039 [10] 苏一丹, 房骁, 覃华, 等. 量子近邻传播聚类算法的研究. 广西大学学报(自然科学版), 2018, 43(2): 561-568. [11] 覃华, 詹娟娟, 苏一丹. 基于概率无向图模型的近邻传播聚类算法. 控制与决策, 2017, 32(10): 1796-1802. [12] Sun L, Liu RN, Xu JC, et al. An affinity propagation clustering method using hybrid kernel function with LLE. IEEE Access, 2018, 6: 68892-68909. DOI:10.1109/ACCESS.2018.2880271 [13] Fan ZY, Jiang J, Weng SQ, et al. Adaptive density distribution inspired affinity propagation clustering. Neural Computing and Applications, 2019, 31(S1): 435-445. DOI:10.1007/s00521-017-3024-6 [14] 杭文龙, 蒋亦樟, 刘解放, 等. 迁移近邻传播聚类算法. 软件学报, 2016, 27(11): 2796-2813. DOI:10.13328/j.cnki.jos.004921 [15] Arzeno NM, Vikalo H. Evolutionary affinity propagation. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA. 2017. 2681–2685. [16] Rao RV, Savsani VJ, Vakharia DP. Teaching-learning-based optimization: A novel method for constrained mechanical design optimization problems. Computer-Aided Design, 2011, 43(3): 303-315. DOI:10.1016/j.cad.2010.12.015 [17] Yu J, Cheng QS. The upper bound of the optimal number of clusters in fuzzy clustering. Science in China Series F: Information Sciences, 2001, 44(2): 119-125. [18] 王开军, 张军英, 李丹, 等. 自适应仿射传播聚类. 自动化学报, 2007, 33(12): 1242-1246.