﻿ 基于PCA优化的PSO-FCM聚类算法
 计算机系统应用  2020, Vol. 29 Issue (3): 213-217 PDF

Optimized PSO-FCM Cluster Algorithm Based on Principal Component Analysis
CHEN Cheng, LIU Zhen-Yu
School of Computer Science and Technology, University of South China, Hengyang 421001, China
Foundation item: National Natural Science Foundation of China (61402220, 61502221); Philosophic Social Science Fund of Hunan Province (16YBA323); Natural Science Foundation of Hunan Province (2015JJ3015); Youth Program of Education Bureau, Hunan Provinve (15B207, 18B279); Innovative Research Program of University of South China (193YXC015)
Abstract: For multi-cluster problems, PSO-FCM cluster algorithm is lack of performance and easily leads to local optimum, which affects the accuracy of multi-cluster result. To tackle these issues, an optimized PSO-FCM cluster algorithm based on PCA is put forward. By introducing PCA processing method, setting different movement weight on each dimension of particle and reducing particle sensitivity, reasonably controlling movement speed of particles on each dimension and effectively decreasing unconstrained particles on each dimension, possibility of moving into false cluster is increased due to over-sensitive particles on interface of multi-cluster groups. This paper introduces related conditions of PSO-FCM algorithm briefly and the proposed optimized algorithm in detail. Finally, this paper presents the experiment results, i.e., the optimized algorithm proposed in this study is totally better than other algorithms in many data sets.
Key words: fuzzy clustering     principal component analysis     particle swarm fuzzy clustering

1 算法介绍

PSO-FCM算法是模糊均值聚类算法基础上的优化算法, 传统的模糊C均值算法的结果精度, 对初始中心的选取有很严格的要求, 并且容易陷入局部最优解. 为了解决这个问题, 国内许多学者, 利用具有集体智能的粒子群优化算法, 与传统模糊C均值算法结合. 利用PSO算法求解初始聚类中心, 进而优化了FCM依赖初始中心的问题; 利用PSO算法中, 粒子个体与粒子群体之间关系, 粒子整体移动的速度可以调节, 进而降低了FCM容易陷入最优解的可能性.

1.1 PSO-FCM算法

PSO-FCM算法是基于数据样本之间的模隶属矩阵建立的聚类算法. 算法的核心思想是: n个文本样本为 $X = ({x_1},{x_2},\cdots,{x_n})$ , 划分为 ${\rm{C}} = ({c_1},{c_2},\cdots,{c_n})$ , p个聚类中心, 计算出每个文本的隶属度 ${\mu _{ij}}$ , ${\mu _{ij}}$ 表示第j个样本隶属于第i个样本的隶属度.

 $\sum\nolimits_{i = 1}^p {{\mu _{ij}}} = 1,{\mu _{ij}} \in \left[ {0,1} \right]$ (1)
 {\mu _{ij}} = \left\{ {\begin{aligned} &{1/\sum\nolimits_{k = 1}^p {{{\left( {{d_{ij}}/{d_{kj}}} \right)}^{\tfrac{2}{{m - 1}}}}} }{,\;{d_{kj}} \ne 0}\\ &0{,\;\;\;{d_{kj}} = 0} \end{aligned}} \right. (2)

 ${J_m} = \sum\nolimits_{j = 1}^n {\sum\nolimits_{i = 1}^p {{{({\mu _{ij}})}^m}{{({x_j} - {c_i})}^2}} }$ (3)
 ${c_i} = \frac{{\displaystyle \sum\nolimits_{j = 1}^n {{{({\mu _{ij}})}^m}{x_j}} }}{{\displaystyle \sum\nolimits_{j = 1}^n {{{({\mu _{ij}})}^m}} }}$ (4)
 $\nu _i^{k + 1} = \left\{ {\begin{array}{*{20}{l}} {\varpi \nu _i^k + {c_1}{r_1}({\rho _i} - x_i^k) + {c_2}{r_2}({\sigma _i} - x_i^k)}{,\;\;k > 1} \\ {{c_2}{r_2}({\sigma _i} - x_i^k)}{,\;\;\;k = 1} \end{array}} \right.$ (5)
 $x_i^{k + 1} = x_i^k + \nu _i^{k + 1}$ (6)

2 算法优化 2.1 PCA算法

 \left\{ \begin{aligned} & {\gamma _1} = {\eta _1}{r_1}, \;{\gamma _2} = {\eta _2}{r_2}\\ &\nu _i^{k + 1} = \left\{ {\begin{array}{*{20}{l}} {\varpi \nu _i^k + {c_1}{\gamma _1}({\rho _i} - x_i^k) + {c_2}{\gamma _2}({\sigma _i} - x_i^k)}{,\;\;k > 1} \\ {{c_2}{r_2}({\sigma _i} - x_i^k)}{,\;\;\;k = 1} \end{array}} \right. \end{aligned}\right. (7)
2.2 优化算法实现

1. $\scriptstyle {C^{\rm best}} \leftarrow {C_0}$

2. $\scriptstyle J_m^{(0)} \leftarrow {\sum\nolimits_{j = 1}^n {\sum\nolimits_{i = 1}^p {{{({\mu _{ij}})}^m}\left({x_j} - c_i^{(0)}\right)} } ^2}$

3. $\scriptstyle J_m^{\rm best} \leftarrow J_m^{(0)}$

4. for $\scriptstyle t = 1,2,\cdots,T$ do

5. 　 $\scriptstyle {\rho _{\rm best}} \leftarrow x_i^t$

6. 　 $\scriptstyle pindex \leftarrow \arg \max ({\mu _{ij}},j = 1,2,\cdots,p)$

7. 　 $\scriptstyle {\sigma _{\rm best}} \leftarrow c_{ pindex}^{(0)}$

8. 　for $\scriptstyle i = 1,2,\cdots,I$ do

9. 　　 $\scriptstyle \nu _i^{t + 1} \leftarrow {c_2}{\gamma _2}({\sigma _2} - x_i^t)$

10. 　　 $\scriptstyle x_i^{t + 1} \leftarrow x_i^t + \nu _i^{t + 1}$

11. 　　if $\scriptstyle J_m^{\rm best} > J_m^{(t)}$ then

12. 　　　 $\scriptstyle J_m^{\rm best} \leftarrow J_m^{(t)}$

13. 　　 $\scriptstyle {J_m} = \sum\nolimits_{j = 1}^n {\sum\nolimits_{i = 1}^p {{{({\mu _{ij}})}^m}{{({x_j} - {c_i})}^2}} }$

14. 　　 $\scriptstyle {C^{\rm best}} = \frac{{\sum\nolimits_{j = 1}^n {{{({\mu _{ij}})}^m}{x_j}} }}{{\sum\nolimits_{j = 1}^n {{{({\mu _{ij}})}^m}} }}$

15. 　　else continue

16. end for

17. 输出中心 $\scriptstyle {C^{\rm best}}$ , $\scriptstyle J_m^{\rm best}$

3 实验分析 3.1 实验数据处理

3.2 模型评价指标

3.3 检验模型性能

 图 1 PCA-PSO-FCM高贡献率图

 图 2 PSO-FCM高贡献率图

4 结论与展望

 [1] 许磊, 张凤鸣. 基于PSO的模糊聚类算法. 计算机工程与设计, 2006, 27(21): 4128-4129. DOI:10.3969/j.issn.1000-7024.2006.21.054 [2] Mekhmoukh A, Mokrani K. Improved fuzzy C-means based Particle Swarm Optimization (PSO) initialization and outlier rejection with level set methods for MR brain image segmentation. Computer Methods and Programs in Biomedicine, 2015, 122(2): 266-281. DOI:10.1016/j.cmpb.2015.08.001 [3] Filho TMS, Pimentel BA, Souza RMCR, et al. Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Systems with Applications, 2015, 42(17–18): 6315-6328. DOI:10.1016/j.eswa.2015.04.032 [4] Jie LL, Liu WD, Sun Z, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means. Neurocomputing, 2017, 249: 140-156. DOI:10.1016/j.neucom.2017.03.068 [5] Benaichouche AN, Oulhadj H, Siarry P. Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction. Digital Signal Processing, 2013, 23(5): 1390-1400. DOI:10.1016/j.dsp.2013.07.005 [6] 邱宁佳, 李娜, 胡小娟, 等. 基于粒子群优化的朴素贝叶斯改进算法. 计算机工程, 2018, 44(11): 27-32, 39. [7] 李锋. 粒子群模糊聚类算法在入侵检测中的研究. 计算机技术与发展, 2014, 24(12): 138-141. [8] 陈寿文. 基于Chebyshev映射的混沌粒子群融合FCM聚类算法. 计算机应用与软件, 2015, 32(7): 255-258. DOI:10.3969/j.issn.1000-386x.2015.07.062 [9] 余晓东, 雷英杰, 岳韶华, 等. 基于粒子群优化的直觉模糊核聚类算法研究. 通信学报, 2015, 36(5): 78-84. [10] 雷浩辖, 刘念, 崔东君, 等. 基于GA与PSO混合优化FCM聚类的变压器故障诊断. 电力系统保护与控制, 2011, 39(22): 52-56. DOI:10.7667/j.issn.1674-3415.2011.22.010 [11] 李元, 白岩松. 改进主成分分析的KNN故障检测研究. 沈阳化工大学学报, 2018, 32(4): 366-371. DOI:10.3969/j.issn.2095-2198.2018.04.014 [12] 王帅, 黄海鸿, 韩刚, 等. 基于PCA与GA-BP神经网络的磁记忆信号定量评价. 电子测量与仪器学报, 2018, 32(10): 190–196. [13] 徐明月, 林泽轩, 顾彦. 基于PCA-SVM模型的红斑鳞屑性皮肤病识别研究. 杭州电子科技大学学报(自然科学版), 2018, 38(6): 35-40.