K-Means Clustering Algorithm Based on Hadoop
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Although there are many advantages in traditional K-means algorithm, the clustering criterion function has poor efficiency on classification of the data set with uneven cluster density. On the basis of weighted standard deviation criterion function, this paper proposes a K-means parallel algorithm which is designed and optimized based on MapReduce programming. And it also increases the convergence judgment. Compared with the traditional K-means algorithm, the designed parallel algorithm has a significant improvement in the aspects of accuracy, speedup ratio, scalability and the convergence of clustering results. It also reduces the probability of misclassification caused by the uneven cluster density, and improves the clustering accuracy of the algorithm. What's more, the optimization effect will be more obvious when it deals with lager data size and more nodes.

    Reference
    Related
    Cited by
Get Citation

刘宝龙,苏金.基于Hadoop平台的K-means聚类算法.计算机系统应用,2017,26(6):182-186

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 06,2016
  • Revised:October 19,2016
  • Adopted:
  • Online: June 08,2017
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063