本文已被:浏览 735次 下载 1227次
Received:November 17, 2020 Revised:December 21, 2020
Received:November 17, 2020 Revised:December 21, 2020
中文摘要: 针对经典K-means聚类算法存在易陷入局部最优解的缺点, 提出并实现了一种基于Hadoop的改进型遗传聚类算法. 该算法利用遗传算法具有全局性和并行性的特点去处理K-means聚类算法易陷入局部最优的缺点, 在此基础上对遗传算法进行改进, 然后将改进后的遗传算法与K-means算法相结合, 为提高算法执行效率, 将其基于Hadoop平台进行了实现. 通过实验将该改进方法与经典聚类算法进行对比分析, 实验结果表明该方法在聚类准确性和聚类效率上均有较大的提高.
Abstract:Concerning the shortcoming that the classical K-means clustering algorithm is easy to fall into the local optimum, an improved genetic clustering algorithm based on Hadoop is proposed and implemented. The algorithm overcomes the above shortcoming with the globality and parallelism of the genetic algorithm. On this basis, the genetic algorithm is improved and then combined with the classical K-means algorithm. To improve the implementation efficiency, we implement the improved genetic clustering algorithm on Hadoop. The proposed method is compared with the classical clustering algorithm through experiments. The results show that the proposed method can greatly improve the clustering accuracy and efficiency.
keywords: K-means text clustering genetic algorithm Hadoop parallelism
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61702093); 东北石油大学青年科学基金(2020QNL-02)
引用文本:
潘俊辉,王辉,张强,王浩畅.基于Hadoop的改进型遗传聚类算法.计算机系统应用,2021,30(9):242-246
PAN Jun-Hui,WANG Hui,ZHANG Qiang,WANG Hao-Chang.Improved Genetic Clustering Algorithm Based on Hadoop.COMPUTER SYSTEMS APPLICATIONS,2021,30(9):242-246
潘俊辉,王辉,张强,王浩畅.基于Hadoop的改进型遗传聚类算法.计算机系统应用,2021,30(9):242-246
PAN Jun-Hui,WANG Hui,ZHANG Qiang,WANG Hao-Chang.Improved Genetic Clustering Algorithm Based on Hadoop.COMPUTER SYSTEMS APPLICATIONS,2021,30(9):242-246