###
计算机系统应用英文版:2016,25(6):17-24
本文二维码信息
码上扫一扫!
基于词共现关系和粗糙集的微博话题检测方法
(1.福建师范大学 数学与计算机科学学院, 福州 350007;2.福建师范大学 网络安全与密码技术福建省重点实验室, 福州 350007)
News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence
(1.School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;2.Network Security and Cryptography key laboratory of Fujian province, Fujian Normal University, Fuzhou 350007, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1492次   下载 2601
Received:September 21, 2015    Revised:October 30, 2015
中文摘要: 为解决传统词共现方法在微博中检测话题时计算复杂度大、查全率不高、查准率低的情况,提出一种基于粗糙集原理的改进词共现算法(RSCW).通过词共现关系形成词共现矩阵,并由共现矩阵找出极大完全子图作为话题簇中心,最后由粗糙集原理找出每个话题的关键词集合.在NLPIR微博内容语料库和实时获取的微博数据集上的实验结果表明,该方法能够有效地从大规模微博信息中检测突发新闻,提高突发新闻的识别率.
中文关键词: 微博  词共现图  粗糙集  话题检测
Abstract:Traditional word co-occurrence detection methods in microblog news encounter the problems of high computational complexity, high time consuming, low recall rate and low precision. An improved algorithm of word co-occurrence detection based on rough set is proposed in this paper aiming at solving these problems. It builds a word co-occurrence matrix through word co-occurrence relation, and finds out the maximum complete subgraph as topic cluster center via co-occurrence matrix, finally identifies each topic keyword set using the rough set theory. The experimental results carried out on the microblog content corpus of NLPIR and the real-time collection of microblog data set verify that this method can effectively detect news topic from the massive microblog information and realize the news topic tracking.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61372158);江苏省高校自然科学研究计划重大项目(11KJA520004);江苏高校优势学科建设工程资助项目;南京财经大学2014年研究生创新研究项目(YJS14104).
引用文本:
兰天,郭躬德.基于词共现关系和粗糙集的微博话题检测方法.计算机系统应用,2016,25(6):17-24
LAN Tian,GUO Gong-De.News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence.COMPUTER SYSTEMS APPLICATIONS,2016,25(6):17-24