###
计算机系统应用英文版:2017,26(11):118-123
本文二维码信息
码上扫一扫!
基于类中心与边界自寻优的聚类算法
(1.中华女子学院 计算机应用技术研究所, 北京 100101;2.天地科技建井研究院, 北京 100013)
Clustering Algorithm Based on Self-Optimizing Center and Boundary of Classes
(1.Research Institute of Applied Computer Technology, China Women University, Beijing 100101, China;2.Mine Construction Institute of Tian-Di Science and Technology, Beijing 100013, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1328次   下载 1181
Received:February 23, 2017    Revised:March 23, 2017
中文摘要: 随着互联网应用的普及和深入,涌现了许多新的应用场景和数据类型,导致许多经典的聚类算法不能有效地适应新的发展形势,成为数据挖掘中的棘手问题和研究热点,为此提出一种新颖的基于类中心与边界自寻优的数据聚类算法.该算法引入数据点“距离半径”分布矩阵R及其“距离半径累计”分布矩阵ΣR概念表征数据聚合度,并依据广度优先原则自寻优R与ΣR中皆为最小的数据点作为类中心;同时,提出“距离半径偏导”分布矩阵R’,描述簇类之间的松散度,并采用广度优先原则自寻优矩阵R’中的突变跃迁增长点,作为簇类之间的分界.通过经典的Aggregation聚类数据集的仿真实验测试,表明该算法能够有效地对多种形状、大小和不同密度分布的数据集进行聚类分析,能较好地识别出孤立点和噪声,具有较高的鲁棒性和分析精度.
Abstract:With the deep development and popularization of Internet, new data types emerge in new application fields so that many classic clustering algorithms are no longer effectively adapted to new situations, so data mining is becoming thorny issues and research focus. Therefore the article proposes a novel clustering algorithm based on self-optimizing the centers and boundaries of classes. The algorithm contains the points' distance-radius-distribution matrix-R and the cumulative radius-distribution matrix-ΣR characterizing the degree of data aggregation. The data points with the minimum R and ΣR as the class centers are searched under the breadth-first. The algorithm also includes the partial derivative matrix-R' of the distance-radius distribution to describe the gradient change of the loose degree between different points. According to self-optimizing and breadth-first, the transition point of matrix-R', which its partial derivative is the biggest one in adjacent points, is found as the class boundary, inside which all points belong to the class. After emulating and testing the algorithm by typical clustering data sets of Aggregation, the result shows that the algorithm can effectively cluster the data sets with different shapes, sizes and different densities, identify the isolated points and noises, and also have better robustness and accuracy.
文章编号:     中图分类号:    文献标志码:
基金项目:国家科技支撑计划资助项目(2012BAB13B00);中华女子学院科研基金重点资助项目(KG2014-02002)
引用文本:
张文军,王建平,范世平,张柳霞.基于类中心与边界自寻优的聚类算法.计算机系统应用,2017,26(11):118-123
ZHANG Wen-Jun,WANG Jian-Ping,FAN Shi-Ping,ZHANG Liu-Xia.Clustering Algorithm Based on Self-Optimizing Center and Boundary of Classes.COMPUTER SYSTEMS APPLICATIONS,2017,26(11):118-123