一种基于网格和最小生成树的数据流聚类算法
作者:

A Grid and MST Based Clustering Algorithm for Data Streams
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [9]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对CluStream 算法对非球状簇聚类的不足,同时基于均匀网格划分的聚类算法多数是以降低聚类精度为代价来提高聚类效率,给出了一种新的数据流聚类算法—GTSClu 算法,该算法是基于网格的最小生成树(MST)数据流聚类算法。算法分为在线处理与离线聚类两部分,并运用了网格拆分与最小生成树技术,可以有效排除噪声数据,发现任意形状的聚类,实验证明提高了聚类效率和质量。

    Abstract:

    CluStream algorithm has poor quality of clustering for non-spherical clusters, at the same time, most grid-based clustering algorithms improve the efficiency of clustering at the cost of reducing clustering accuracy. The paper gives a new kind of clustering algorithm for data stream—GTSClu, it is the minimum spanning tree data stream clustering algorithm based on grid, which is divided into online processing and offline clustering, combining with grid resolution and minimum spanning tree techniques. GTSClu algorithm cannot only find clusters with arbitrary shape and amount, but also deal with noise data effectively, the efficiency and quality of clustering is improved.

    参考文献
    1 Muthukrishnan S, Shah R, Vitter J. Mining Deviants in Time Series Data Streams. Proc. of the 16th International Conference on Scientific and Statistical Database Management (SSDM’04). Santorini Island, Greece, 2004. 41-50.
    2 Guha S, Mishra N, Motwani R, Ocallaghan L. Clustering data Streams. Proc. of the 2000 Annual Symp. on Foundations of Computer Science. 2000. 359-366.
    3 Han J, Kamber M. Data Mining: Concepts and Techniques (Second Edition). Morgan Kaufmann, Elsevier Inc, 2006. 467-589.
    4 Yang YD, Sun ZH, Zhang J. Finding outliers in distributed data streams based on kernel density estimation. Computer Research and Development, 2005,42(9):1498-1504.
    5 Aggarwal C, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. Proc. of 29th International Conference on Very Large Databases (VLDB’03). Berlin, Germany, 2003. 81-92.
    6 邱保志,沈钧毅.基于网格技术的高精度聚类算法.计算机,2006,32(3):12-13.
    7 何勇等.基于动态网格的数据流聚类分析.计算机应用,2008,25(11):2-4.
    8 严蔚敏,吴伟民.数据结构.北京:清华大学出版社,1997.173-175.
    9 Hsu CM, Chen MS. Subspace clustering of high dimensional spatial data with noises. Heidelberg: Springer, 2004. 31-40.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王海波,王宪鹏,王芳,陈志国.一种基于网格和最小生成树的数据流聚类算法.计算机系统应用,2011,20(2):152-156

复制
分享
文章指标
  • 点击次数:2751
  • 下载次数: 88
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2010-06-17
  • 最后修改日期:2010-07-16
文章二维码
您是第11223087位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号