Parallel Implementation of Improved K-means Algorithm Based on Spark

doi:10.15888/j.cnki.csa.006296

AIPUB归智期刊联盟

WeChat

Mobile website

2025-8-3- 0

Home > Archive>Volume 27, Issue 4, 2018 >151-156. DOI:10.15888/j.cnki.csa.006296

PDF HTML XML Export Cite reminder

Parallel Implementation of Improved K-means Algorithm Based on Spark
DOI:
                        10.15888/j.cnki.csa.006296
                    
CSTR:
                        
                    
Author:
                        SONG Dong-FeiSONG Dong-Fei
School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
XU HuaXU Hua
School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In view of the problems that when processing massive data the traditional K-means is highly complex and insufficient in computation, a SKDk-means (Spark based kd-tree K-means) parallel clustering algorithm has been proposed. The algorithm improves the choice of initial center point by introducing kd-tree and overcomes the problem that the traditional K-means algorithm is easy to fall into the local optimal solution due to the uncertainty of the initial point. During K-means iterative calculation, the redundant computation has been reduced and clustering speed has been accelerated by the nearest neighbor search of kd-tree. The parallelization of the algorithm is realized on the spark platform and it is applied to the massive data clustering. Finally, the experimental results show that the algorithm has good accuracy and parallel computing performance.

Key words:kd-tree;Spark;K-means;parallel;cloud computing

Get Citation

宋董飞,徐华.基于Spark的K-means改进算法的并行化实现.计算机系统应用,2018,27(4):151-156

Copy

Article Metrics

Abstract:2297
PDF: 2778
HTML: 2746
Cited by: 0

History

Received:July 23,2017
Revised:August 09,2017
Adopted:
Online: April 03,2018
Published:

Article QR Code

You are the first1025910Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063