###
DOI:
计算机系统应用英文版:2014,23(12):125-130
本文二维码信息
码上扫一扫!
基于相对密度和熵的混合属性聚类融合算法
(浙江工业大学 计算机科学与技术学院, 杭州 310023)
Clustering Ensemble Algorithm for Mixed Attributes Data Based on Relative Density and Entropy
(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1381次   下载 2639
Received:March 27, 2014    Revised:May 04, 2014
中文摘要: 混合属性聚类是近年来的研究热点, 对于混合属性数据的聚类算法要求处理好数值属性以及分类属性, 而现存许多算法没有很好得平衡两种属性, 以至于得不到令人满意的聚类结果. 针对混合属性, 在此提出一种基于交集的聚类融合算法, 算法单独用基于相对密度的算法处理数值属性, 基于信息熵的算法处理分类属性, 然后通过基于交集的融合算法融合两个聚类成员, 最终得到聚类结果. 算法在UCI数据集Zoo上进行验证, 与现存k-prototypes与EM算法进行了比较, 在聚类的正确率上都优于k-prototypes与EM算法, 还讨论了融合算法中交集元素比的取值对算法结果的影响.
中文关键词: 聚类融合  混合属性  信息熵  相对密度
Abstract:Mixed attributes data clustering is a research hotspot in recent years. For mixed attributes data clustering algorithm, it requires handling numeric attributes and categorical attributes simultaneously. However many algorithms have not very good balance with numeric and categorical attributes, and the cluster results are not satisfied. For mixed attributes data set, a new clustering ensemble algorithm based on intersection is proposed. It processes the numeric attributes with a new relative density clustering algorithm, and processes the categorical attributes with a clustering algorithm based on information entropy. Then it fuses these two cluster members with a cluster fusion algorithm based on intersection. Finally, it gets the clustering results. It is validated by taking an experiment on UCI data set Zoo, and compared with the existing k-prototypes algorithm and EM algorithm. The experiment result shows that the new algorithm has higher flexibility and accuracy. The influence of the intersection element ratioand to the result is also discussed.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
余泽.基于相对密度和熵的混合属性聚类融合算法.计算机系统应用,2014,23(12):125-130
YU Ze.Clustering Ensemble Algorithm for Mixed Attributes Data Based on Relative Density and Entropy.COMPUTER SYSTEMS APPLICATIONS,2014,23(12):125-130