Application of Improved K-Means Algorithm to Analysis of Online Public Opinions
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [10]
  • |
  • Related [20]
  • |
  • Cited by [0]
  • | |
  • Comments
    Abstract:

    Combining background application requirement of online public opinion analysis, this paper firstly introduces the processing of text information, and then discusses the K-means algorithm of the text clustering, according to its characteristic that clustering results depend on the centers of initial clustering, and improves it. Based on the thought that text title can express its content, the improved algorithm uses sparse character vector to express text title, calculates the sparse similarity of them and ascertains the centers of initial clustering. The experiments show that the method improves the clustering accuracy. Compared with another algorithm based on the principle of maximum and minimum distance, the improved method heightens the efficiency and ensures the clustering accuracy.

    Reference
    1 Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. Pattern Recognition, 2003,36(2):451.
    2 李凡,林爱武,陈国社.一种基于VSM 的文本分类系统的设计与实现.华中科技大学学报:自然科学版,2005,33(3):53.
    3 MacQueen J. Some methods for classification and analysis of multivariate observations. Proc. of the 5th Berkeley Symp. on Mathematics Statistic Problem, 1967: 281-297.
    4 Dhillon IS, Modha DS.Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42(1):143-175.
    5 Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Communications of ACM, 1975,18(5): 613-620.
    6 Bun KK. Topic Extraction from News Archive Using TF*PDT Algorithm. Proceedings of the 3rd International Conference on Web Information Systems Engineering. 2002.
    7 赵亚琴,邹红艳.基于信息粒度的文本聚类算法.计算机工程与设计,2009,30(22):51-72.
    8 Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques Proceeding of the 6th ACM-SIGKDD International Conference on Text Mining, Boston,MA,USA: ACM Press, 2000:103-122.
    9 张睿.基于K-means 算法的中文文本聚类算法的研究与实现[硕士学位论文].西安:西北大学,2009.29-30.
    10 Steinbach M, Karypis G, Kumara V. A Comparison of Document Clustering Techniques. KDD-2000 Workshop on Text Mining, Boston MA, August 20-23, 2000: 109-110.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

汤寒青,王汉军.改进的K-means 算法在网络舆情分析中的应用.计算机系统应用,2011,20(3):165-168,196

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 07,2010
  • Revised:August 04,2010
Article QR Code
You are the first990486Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063