Initialization Algorithm of Clustering Using Subsample for KD-Tree
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [10]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    In the field of initialization of clustering for large data set, random sampling is used as an important reduction operation. This paper focuses on the process and property of random sampling, and proposes a novel random sampling method which is based on KD-Tree samples. Sample spaces were further divided into several sub spaces using KD-Tree. KD-Tree samples were created for each sub-space. This overcomes the defect of skewness of the random samples. Thus the good initial centroids can well describe the clustering category of the whole data set. The experiment results show that the cluster initial centroids selected by the new method is more closed to the desired cluster centers, and the better clustering accuracy can be achieved.

    Reference
    1 Xu R, Donald Wunsch II. Survey of clustering algorithms. IEEE Trans. on Neural networks, 2005,16(3):645-678.
    2 He J, Lan M, Tan CL, et al. Initialization of cluster refinement algorithms: a review and comparative study. Proc. of Int’l Joint Conference on Neural Networks. 2004: 297-302.
    3 Arai K, Barakbah AR. Hierarchical K-means: an algorithm for centroids initialization for K-means. Reports of the Faculty of Science and Engineering, 2007,36(1):25-31.
    4 Bradley PS, Fayyad UM. Refining Initial Points for K-Means Clustering. In: Shavlik J, ed. Proc. of the Fifteenth Int’l Conf. on Machine Learning (ICML). AAAI Press, 1998. 91-99.
    5 Rocke DM, Dai J. Sampling and Subsampling for Cluster Analysis in Data Mining. With Applications to Sky Survey Data, Data Mining and Knowledge Discovery, 2003,7(2):215 -232.
    6 Bentley JL. and Friedman JH. Data structures for range searching. ACM Computing Surveys, 1979,11(4):397-409.
    7 Tamminen M. Comment on quad- and octrees. Communications of the ACM, 1984,30(3):204-212.
    8 仇明华,殷丽华,李斌.基于多维二进制搜索树的异常检测技术.计算机工程与应用,2007,43(22):122-125.
    9 Alpaydin E, Alimoglu F. UCI Repository of Machine Learning Databases. http://archive.ics.uci.edu/ml/, 2009, 11.
    10 Forgy E. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. WNAR meetings, Univ of Calif Riverside, number 768, 1965.
    Cited by
Get Citation

潘章明.基于KD 树子样的聚类初始化算法.计算机系统应用,2011,20(1):80-83

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 27,2010
  • Revised:May 29,2010
Article QR Code
You are the first990817Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063