Abstract:In the traditional K-means algorithm, the clustering results greatly depend on the random selection of initial cluster centers and the artificial K values. In order to improve the clustering accuracy, this paper proposes to select the initial cluster centers by using the minimum distance and the average clustering degree. The number of clusters is obtained by the hierarchical clustering CURE algorithm as K value, so that the clustering accuracy can be improved. Finally, the improved K-means algorithm is applied to the micro-blog topic discovery. Through the analysis of the experimental results, it is proved that the algorithm can improve the accuracy of clustering results.