本文已被:浏览 259次 下载 884次
Received:January 24, 2024 Revised:February 26, 2024
Received:January 24, 2024 Revised:February 26, 2024
中文摘要: 基于最小生成树(minimum spanning tree, MST)的聚类算法能够识别具有任意形状的簇, 该算法在如何有效构建最小生成树和识别无效边方面存在不足, 而且易受到噪声点影响. 本文利用密度峰值聚类算法思想的优点来寻找局部密度峰, 局部密度峰在保留原始数据集分布结构的同时, 排除了噪声点, 因此, 将局部密度峰与最小生成树聚类算法相结合, 采用标签传播, 提出了基于局部密度峰和标签传播的最小生成树聚类算法(DPMST). 该算法采用了局部密度峰之间基于共享邻的距离, 利用局部密度峰之间的邻域信息, 有效构造最小生成树和识别无效边, 使算法能够发现具有复杂结构的簇. 标签传播增强强标签, 削弱弱标签, 以细化错误的标签, 特别是对于边界点以及揭示复杂流形, 能够提高聚类结果的质量. 人工和真实数据集上的实验结果表明, 与经典聚类算法DPC、MST、K-means、DBSCAN、AP、SC和BIRCH比较, DPMST算法表现优异.
Abstract:Clustering algorithm based on the minimum spanning tree (MST) can identify clusters with arbitrary shapes, but the algorithm has limitations in efficiently constructing a minimum spanning tree and identifying invalid edges and is easily influenced by noise points. This study proposes an MST clustering algorithm based on local density peaks and label propagation (DPMST) by combining the advantages of the density peaks clustering algorithm to find local density peaks and exclude noise points with the MST algorithm. The DPMST algorithm adopts the shared neighbors-based distance between local density peaks and uses the neighborhood information between local density peaks to efficiently construct minimum spanning trees and identify invalid edges, enabling the discovery of clusters with complex structures. Label propagation is used to enhance the strong labels and weaken the weak labels to refine wrong labels, which can improve the quality of clustering results, especially for border region points as well as revealing complex manifolds. The experimental results on several synthetic and real-world datasets show that the DPMST algorithm outperforms classical clustering algorithms DPC, MST, K-means, DBSCAN, AP, SC, and BIRCH.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(12071453); 量子通信与量子计算机重大项目(2021ZD0302902)
引用文本:
林钰莹,侯新民.基于局部密度峰和标签传播的最小生成树聚类.计算机系统应用,2024,33(8):18-29
LIN Yu-Ying,HOU Xin-Min.Minimum Spanning Tree Clustering Based on Local Density Peak and Label Propagation.COMPUTER SYSTEMS APPLICATIONS,2024,33(8):18-29
林钰莹,侯新民.基于局部密度峰和标签传播的最小生成树聚类.计算机系统应用,2024,33(8):18-29
LIN Yu-Ying,HOU Xin-Min.Minimum Spanning Tree Clustering Based on Local Density Peak and Label Propagation.COMPUTER SYSTEMS APPLICATIONS,2024,33(8):18-29