###
DOI:
计算机系统应用英文版:2011,20(3):165-168,196
本文二维码信息
码上扫一扫!
改进的K-means 算法在网络舆情分析中的应用
(1.中国科学院 研究生院,北京 100049;2.中国科学院 沈阳计算技术研究所,沈阳 110171)
Application of Improved K-Means Algorithm to Analysis of Online Public Opinions
(1.Graduate University, Chinese Academy of Sciences, Beijing 100049, China;2.Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110171, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1913次   下载 4066
Received:July 07, 2010    Revised:August 04, 2010
中文摘要: 结合网络舆情分析的应用需求背景,首先介绍了文本信息的处理,然后探讨了文本聚类中的K-means算法,针对其对初始聚类中心的依赖性的特点,对算法加以改进。基于文档标题能够代表文档内容的思想,改进算法采用稀疏特征向量表示文本标题,计算标题间的稀疏相似度,确定初始聚类中心。最后实验证明改进的K-means 算法提高了聚类的准确度;与基于最大最小距离原则的初始中心选择算法比较,提高了执行效率,同时保证了聚类准确度。
Abstract:Combining background application requirement of online public opinion analysis, this paper firstly introduces the processing of text information, and then discusses the K-means algorithm of the text clustering, according to its characteristic that clustering results depend on the centers of initial clustering, and improves it. Based on the thought that text title can express its content, the improved algorithm uses sparse character vector to express text title, calculates the sparse similarity of them and ascertains the centers of initial clustering. The experiments show that the method improves the clustering accuracy. Compared with another algorithm based on the principle of maximum and minimum distance, the improved method heightens the efficiency and ensures the clustering accuracy.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
汤寒青,王汉军.改进的K-means 算法在网络舆情分析中的应用.计算机系统应用,2011,20(3):165-168,196
TANG Han-Qing,WANG Han-Jun.Application of Improved K-Means Algorithm to Analysis of Online Public Opinions.COMPUTER SYSTEMS APPLICATIONS,2011,20(3):165-168,196