Parallel Implementation of Improved K-means Algorithm Based on Spark
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In view of the problems that when processing massive data the traditional K-means is highly complex and insufficient in computation, a SKDk-means (Spark based kd-tree K-means) parallel clustering algorithm has been proposed. The algorithm improves the choice of initial center point by introducing kd-tree and overcomes the problem that the traditional K-means algorithm is easy to fall into the local optimal solution due to the uncertainty of the initial point. During K-means iterative calculation, the redundant computation has been reduced and clustering speed has been accelerated by the nearest neighbor search of kd-tree. The parallelization of the algorithm is realized on the spark platform and it is applied to the massive data clustering. Finally, the experimental results show that the algorithm has good accuracy and parallel computing performance.

    Reference
    Related
    Cited by
Get Citation

宋董飞,徐华.基于Spark的K-means改进算法的并行化实现.计算机系统应用,2018,27(4):151-156

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 23,2017
  • Revised:August 09,2017
  • Adopted:
  • Online: April 03,2018
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063