###
DOI:
计算机系统应用英文版:2012,21(2):117-121
本文二维码信息
码上扫一扫!
基于最近邻相似度的孤立点检测及半监督聚类算法
(浙江工业大学 计算机应用技术,杭州 310023)
Outlier Detection and Semi-Supervised Clustering Algorithm Based on Shared Nearest Neighbors
(Department of Computer Application Technology, Zhejiang University of Technology, Hangzhou 31100, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 2266次   下载 4256
Received:June 15, 2011    Revised:August 09, 2011
中文摘要: 传统的聚类算法是一种无监督的学习过程,聚类的精度受到相似性度量方式以及数据集中孤立点的影响,并且算法也没有很好的利用先验知识,无法体现用户的需求。因此提出了基于共享最近邻的孤立点检测及半监督聚类算法。该算法采用共享最近邻为相似度,根据数据点的最近邻居数目来判断是否为孤立点,并在删除孤立点的数据集上进行半监督聚类。在半监督聚类过程中加入了经过扩展的先验知识,同时根据图形分割原理对数据集进行聚类。文中使用真实的数据集进行仿真,其仿真结果表明,本文所提出的算法能有效的检测出孤立点,并具有很好的聚类效果。
Abstract:Traditional clustering analysis is unsupervised. Its precision is affected by similarity measures and outlier in the dataset and the algorithm don't take advantage of prior knowledge which can reflect the demands of users, therefore this article proposes the outlier detection and semi-supervised clustering algorithm which based on shared nearest neighbors. The algorithm according to the number of the nearest neighbors of the data in the dataset to detect the outliers in data dataset, then deal with the dataset which be operated by detecting the outliers by using semi-clustering. And during the clustering process, it adds some prior knowledge which was expanded and cluster the dataset based on the principle of graph segmentation. And the article uses some UCI datasets to make simulation experiments. The results show that the algorithm can detect the outliers effectively, and have good performance of the clustering effect.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
郑灵芝,黄德才.基于最近邻相似度的孤立点检测及半监督聚类算法.计算机系统应用,2012,21(2):117-121
ZHENG Ling-Zhi,HUANG De-Cai.Outlier Detection and Semi-Supervised Clustering Algorithm Based on Shared Nearest Neighbors.COMPUTER SYSTEMS APPLICATIONS,2012,21(2):117-121