Microblog Hot Topics Discovery Method Based on Probabilistic Topic Model
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [15]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Microblog has the characteristic of short length, complex structure and words deformation. Therefore, traditional vector space model (VSM) and latent semantic analysis (LSA) are not suitable for modeling them. In this paper, a two stage clustering algorithm based on probabilistic latent semantic analysis (pLSA) and Kmeans clustering (Kmeans) is proposed. Besides, this paper also presents the definition of popularity and mechanism of sorting the topics. Experiments show that our method can effectively cluster topics and be applied to microblog hot topic detection.

    Reference
    1 Salton G. The SMART retrieval system experiments in automatic document processing. Englewood Cliffs, New Jersey: Prentice Hall Inc. 1971: 337-354.
    2 张晨逸,孙建伶.基于MB-LDA 模型的微博主题挖掘.计算机研究与发展,2011,48(10):1795-1802.
    3 郑斐然,苗夺谦,张志飞,等.一种中文微博新闻话题检测的方法.计算机科学,2012,1:138-141.
    4 Raghavan VV, Wong MKS. A critical analysis of vector space model for information retrieval. Journal of the American Society for information Science, 1986, 37(5): 279-287.
    5 邓一贵,马雯雯.基于隐含语义分析的微博话题发现方法.计算机工程与应用,2012.
    6 Hoffmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1): 177-196.
    7 Sebastiani F. Machine learning in automated text categorisation. ACM Computing surveys (CSUR), 2001, 34(1): 1-47.
    8 Pal A, Counts S. Identifying Topical Authorities in Microblogs. Proc. of Web Search and Data Mining. New York. 2011. 45-54.
    9 路荣,项亮,刘明荣,杨青.基于隐主题分析和文本聚类的微博客新闻话题发现研究.第六届全国信息检索学术会议论文集,2010.
    10 Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models. Proc. of the Fourth International Conference on Weblogs and Social Media. MenloPark: AAAI Press, 2010: 130-137.
    11 Teevan J, Ramage D, Morris MR. TwitterSearch: a comparison of microblog search and web search. Proc. of the Fourth Association for Computing Machinery International Conference on Web Search and Data Mining. New York, USA. 2011. 35-44.
    12 Savage N. Twitter as medium and message. Communications of the Association for Computing Machinery, 2011, 54(3): 18-20.
    13 Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: real-time event detection by social sensors. Proc. of the 19th International Conference on World Wide Web. 2010, 46(36): 851-860.
    14 Zhang Y, Xu G, Zhou X. A latent usage approach for clustering web transaction and building user profile. Proc. of International Conference on Advanced Data Mining and Applications 2005. Wuhan, China. 2005. 231-236.
    15 孙胜平.中文微博客热点话题检测与跟踪技术研究[学位论文].北京.北京交通大学,2011.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

米文丽,孙曰昕.利用概率主题模型的微博热点话题发现方法.计算机系统应用,2014,23(8):163-167

Copy
Share
Article Metrics
  • Abstract:1529
  • PDF: 3703
  • HTML: 0
  • Cited by: 0
History
  • Received:December 18,2013
  • Revised:January 14,2014
  • Online: August 18,2014
Article QR Code
You are the first990823Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063