Abstract:Although there are many advantages in traditional K-means algorithm, the clustering criterion function has poor efficiency on classification of the data set with uneven cluster density. On the basis of weighted standard deviation criterion function, this paper proposes a K-means parallel algorithm which is designed and optimized based on MapReduce programming. And it also increases the convergence judgment. Compared with the traditional K-means algorithm, the designed parallel algorithm has a significant improvement in the aspects of accuracy, speedup ratio, scalability and the convergence of clustering results. It also reduces the probability of misclassification caused by the uneven cluster density, and improves the clustering accuracy of the algorithm. What's more, the optimization effect will be more obvious when it deals with lager data size and more nodes.