###
DOI:
计算机系统应用英文版:2011,20(1):80-83
本文二维码信息
码上扫一扫!
基于KD 树子样的聚类初始化算法
(广东金融学院 计算机科学与技术系,广州 510521)
Initialization Algorithm of Clustering Using Subsample for KD-Tree
(Department of Computer Science and Technology, Guangdong University of Finance, Guangzhou 510521, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 2256次   下载 4416
Received:April 27, 2010    Revised:May 29, 2010
中文摘要: 在处理大数据集聚类初始化问题时,随机子样法是一种重要的数据约简操作。对随机取样的过程、特征及缺陷进行了分析,提出一种基于KD 树子样的聚类初始化方法。该方法利用KD 树将样本空间以递归方式细分成多个子空间,并分别在各子空间中随机取样形成KD 树子样,有效避免了随机子样分布有偏的不足,使得子样中好的聚类初始点也能很好的表达整个数据集的聚类结构。仿真结果表明,该方法选择的聚类初始点更加接近期望的聚类中心,能获得更高的聚类精度。
中文关键词: 聚类初始化  KD 树  子样  K 均值算法
Abstract:In the field of initialization of clustering for large data set, random sampling is used as an important reduction operation. This paper focuses on the process and property of random sampling, and proposes a novel random sampling method which is based on KD-Tree samples. Sample spaces were further divided into several sub spaces using KD-Tree. KD-Tree samples were created for each sub-space. This overcomes the defect of skewness of the random samples. Thus the good initial centroids can well describe the clustering category of the whole data set. The experiment results show that the cluster initial centroids selected by the new method is more closed to the desired cluster centers, and the better clustering accuracy can be achieved.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
潘章明.基于KD 树子样的聚类初始化算法.计算机系统应用,2011,20(1):80-83
PAN Zhang-Ming.Initialization Algorithm of Clustering Using Subsample for KD-Tree.COMPUTER SYSTEMS APPLICATIONS,2011,20(1):80-83