一种改进的动态k-均值聚类算法
作者:

Research and Realization of a Web Information Extraction and Knowledge Presentation System
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [16]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对经典k-均值聚类方法只能处理静态数据聚类的问题, 本文提出一种能够处理动态数据的改进动态k-均值聚类算法, 称为Dynamical K-means算法. 该方法在经典k-均值方法的基础上, 通过对动态变化的数据集中新加入样本进行分析和处理, 根据聚类目标函数改变的实际情况选择最相似的类别进行局部更新或进行全局经典k-均值聚类, 有效检测发生聚类概念漂移和没有发生聚类概念漂移的情况, 从而实现了动态数据的在线聚类, 避免了经典k-均值方法在动态数据中每次都要对全部数据重新聚类而导致算法速度过慢的问题. 标准数据集和人工社会网络数据集上的实验结果表明, 与经典k-均值聚类方法相比, 本文提出的动态k-均值聚类方法能快速高效地处理动态数据聚类问题, 并有效地检测动态数据聚类过程中所产生的概念漂移问题.

    Abstract:

    This paper presents an improved dynamical k-means clustering model to solve the dynamical problem, called Dynamical K-means algorithm, in order to solve the problem that only solving the constant clustering problems of classical k-means clustering method. Based on classical k-means method, by analysis and solving the new adding samples of dynamical training data set, local renew or global clustering is performed by the changing range of objective function, and the dynamical data are clustered online. The speed of classical k-means algorithm is slow by the reiterative clustering is needed of every online clustering step, but the speed of Dynamical K-means algorithm is accelerated. Simulation results on standard and artificial social network datasets demonstrate that comparing with classical k-means clustering means, the excellent clustering results can be obtained by this method and the concept drifting phenomenon can be monitored efficiently.

    参考文献
    1 http://www.zdnet.com.cn/files/mail_con.php?mid=1735,2011, 7.
    2 Jain AK,Murty MN,Flynn PJ.Data clustering:a review.ACM Computing Surveys,1999,31(3):264-323.
    3 MacQueen J.Some methods for classification and analysis of multivariate observations.Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,Berkeley,1967,1:281-297.
    4 Kaufman L,Peter JR. Finding groups in data:an introduction to cluster analysis.Washington:John Wiley & Sons,1990.
    5 Ng RT,Han JW.Efficient and effective clustering methods for spatial data mining.Proceedings of the 20th International Conference on Very Large Data Bases (VLDB1994),Santiago, 1994:144-145.
    6 Cilibrasi RL,Vitányi PM.A fast quartet tree heuristic for hierarchical clustering.Pattern recognition,2011,44(3):662-677.
    7 白旭,靳志军. K-中心点聚类算法优化模型的仿真研究.计算机仿真,2011,28(1):218-221.
    8 Ester M,Kriegel HP,Sander J.A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD1996),Portland,Oregon, 1996:125-138.
    9 武佳薇,李雄飞,孙涛等.邻域平衡密度聚类算法.计算机研究与发展,2010,47(6):1044-1052.
    10 Su MC,Chou CH.A modified version of the k-means algorithm with distance based on cluster symmetry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001,23(6):674-680.
    11 Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace clustering of high dimensional data for data mining application.Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD1998),Seattle,1998:94-104.
    12 李凯,李昆仑,崔丽娟. 模糊聚类在集成学习中的应用研究. 计算机研究与发展,2007,44(z2):203-207.
    13 王玲,薄列峰,焦李成.密度敏感的半监督谱聚类.软件学报, 2007,18(10):2412-2422.
    14 王会青,陈俊杰,郭凯.遗传优化的谱聚类方法研究.计算机工程与应用,2011,47(14):143-145.
    15 Kanungo T,Mount DM.A local search approximation algorithm for k-means clustering.Computational Geometry, 2004,28(2/3):89-112.
    16 Elkan C.Using the triangle inequality to accelerate k-means. Proceedings of the Twentieth International Conference on Machine Learning(ICML-2003),Menlo Park,AAAI Press, 2003:147-153.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

胡伟.一种改进的动态k-均值聚类算法.计算机系统应用,2013,22(5):116-121

复制
分享
文章指标
  • 点击次数:1570
  • 下载次数: 4186
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2012-10-22
  • 最后修改日期:2012-12-01
文章二维码
您是第11226217位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号