基于网格和信息熵的多密度聚类算法
作者:
基金项目:

湖南省自然科学基金(08JJ3132)


Grid-Based and Information Entropy-Based Clustering Algorithm for Multi-Density
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [12]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    虽然现有的很多聚类算法能发现任意形状、任意大小的类,但用于多密度的数据集时却难以取得令人满意的结果。为提高对多密度数据集的聚类效果,提出了一种基于网格和信息熵的多密度聚类算法,它通过不同密度的网格所携带的信息熵,自动计算出密度阈值,找出在多密度数据集中不同的类。实验证明,该算法能有效的去处噪声,发现多密度的类,具有较好的聚类效果。

    Abstract:

    Although many existing clustering algorithm can find the arbitrary shape and different size clusters, but it is difficult to obtain satisfactory results for multi-density data set. In order to improve the quality and efficiency of clustering algorithm, the paper presents a new improving precision clustering algorithm based on grid and information entropy, which through information entropy which carried by the different densities of grid to automatically calculate the density threshold, and then identify different clusters in the multi-density data set. Experiments show that the algorithm can wipe off the noise effectively and find out the multi-density clusters that have better clustering results.

    参考文献
    1 Han JW, Kamber M.范明,孟小峰译.数据挖据:概念与技术第2 版.北京:机械工业出版社,2007.251-253.
    2 Uncu O, Gruver WA, Kotak DB. GRIDBSCAN:Griddensity-based spatial clustering of applications with noise.
    2006 IEEE International Conference on Systems, Man, andCybernetics, Taipei, October 8-11, 2006.
    3 Karypis G, Han EH, Kumar V. Chameleon: a hierarchicalclustering algorithm using dynamic modeling. IEEEComputer, 1999,32(8):68-75.
    4 Ertoz L, Steinbach M, Kumar V. Finding clusters of differentsizes, shapes, and densities in noisy, high dimensional data.Proc. of the 3rd SIAM International Conference on DataMining. San Francisco: SIAM Press, 2003. 1-12.
    5 Song G, Ying X. Gdcic: a grid-base densityconfidenceintervalclustering algorithm for multi-density dataset in largespatial database. Proc. of the 6th International Conference onIntelligent Systems Design and Applications. WashingtonDC; IEEE Computer, 2006.713-717.
    6 赵艳厂,宋梅,采德德,等.用于不同密成聚类的多阶段等密度线算法.北京邮电大学学报,2003,26(2):42-47.
    7 夏英,李克非,丰江帆.基于网格梯度的多密度聚类算法.计算机应用研究,2008,25(11):3278-3280.
    8 阮吉寿,张华.信息论基础.北京:机械工业出版社,2008.7-11.
    9 Hsu CM, Chen MS. Subspace Clustering of HighDimensional Spatial Data with Noises. Heidelberg: Springer,2004.31-40.
    10 Qiu BZ, Li XL, Shen JY. Grid-Based Clustering AlgorithmBased on Intersecting Partition and Density Estimation.Proc of PAKDD. Berlin: Springer, 2007. 368-377.
    11 程国庆,陈晓云.基于网格相对密度的多密度聚类算法.算机工程与应用,2009,45(1):156-158.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

周悦来,谭建豪.基于网格和信息熵的多密度聚类算法.计算机系统应用,2011,20(10):189-192

复制
分享
文章指标
  • 点击次数:1681
  • 下载次数: 3137
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2011-03-03
  • 最后修改日期:2011-03-26
文章二维码
您是第11307875位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号