Although many existing clustering algorithm can find the arbitrary shape and different size clusters, but it is difficult to obtain satisfactory results for multi-density data set. In order to improve the quality and efficiency of clustering algorithm, the paper presents a new improving precision clustering algorithm based on grid and information entropy, which through information entropy which carried by the different densities of grid to automatically calculate the density threshold, and then identify different clusters in the multi-density data set. Experiments show that the algorithm can wipe off the noise effectively and find out the multi-density clusters that have better clustering results.
1 Han JW, Kamber M.范明,孟小峰译.数据挖据:概念与技术第2 版.北京:机械工业出版社,2007.251-253.
2 Uncu O, Gruver WA, Kotak DB. GRIDBSCAN:Griddensity-based spatial clustering of applications with noise.
2006 IEEE International Conference on Systems, Man, andCybernetics, Taipei, October 8-11, 2006.
3 Karypis G, Han EH, Kumar V. Chameleon: a hierarchicalclustering algorithm using dynamic modeling. IEEEComputer, 1999,32(8):68-75.
4 Ertoz L, Steinbach M, Kumar V. Finding clusters of differentsizes, shapes, and densities in noisy, high dimensional data.Proc. of the 3rd SIAM International Conference on DataMining. San Francisco: SIAM Press, 2003. 1-12.
5 Song G, Ying X. Gdcic: a grid-base densityconfidenceintervalclustering algorithm for multi-density dataset in largespatial database. Proc. of the 6th International Conference onIntelligent Systems Design and Applications. WashingtonDC; IEEE Computer, 2006.713-717.
9 Hsu CM, Chen MS. Subspace Clustering of HighDimensional Spatial Data with Noises. Heidelberg: Springer,2004.31-40.
10 Qiu BZ, Li XL, Shen JY. Grid-Based Clustering AlgorithmBased on Intersecting Partition and Density Estimation.Proc of PAKDD. Berlin: Springer, 2007. 368-377.