Abstract:Concerning the shortcoming that the classical K-means clustering algorithm is easy to fall into the local optimum, an improved genetic clustering algorithm based on Hadoop is proposed and implemented. The algorithm overcomes the above shortcoming with the globality and parallelism of the genetic algorithm. On this basis, the genetic algorithm is improved and then combined with the classical K-means algorithm. To improve the implementation efficiency, we implement the improved genetic clustering algorithm on Hadoop. The proposed method is compared with the classical clustering algorithm through experiments. The results show that the proposed method can greatly improve the clustering accuracy and efficiency.