本文已被:浏览 1492次 下载 2601次
Received:September 21, 2015 Revised:October 30, 2015
Received:September 21, 2015 Revised:October 30, 2015
中文摘要: 为解决传统词共现方法在微博中检测话题时计算复杂度大、查全率不高、查准率低的情况,提出一种基于粗糙集原理的改进词共现算法(RSCW).通过词共现关系形成词共现矩阵,并由共现矩阵找出极大完全子图作为话题簇中心,最后由粗糙集原理找出每个话题的关键词集合.在NLPIR微博内容语料库和实时获取的微博数据集上的实验结果表明,该方法能够有效地从大规模微博信息中检测突发新闻,提高突发新闻的识别率.
Abstract:Traditional word co-occurrence detection methods in microblog news encounter the problems of high computational complexity, high time consuming, low recall rate and low precision. An improved algorithm of word co-occurrence detection based on rough set is proposed in this paper aiming at solving these problems. It builds a word co-occurrence matrix through word co-occurrence relation, and finds out the maximum complete subgraph as topic cluster center via co-occurrence matrix, finally identifies each topic keyword set using the rough set theory. The experimental results carried out on the microblog content corpus of NLPIR and the real-time collection of microblog data set verify that this method can effectively detect news topic from the massive microblog information and realize the news topic tracking.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61372158);江苏省高校自然科学研究计划重大项目(11KJA520004);江苏高校优势学科建设工程资助项目;南京财经大学2014年研究生创新研究项目(YJS14104).
引用文本:
兰天,郭躬德.基于词共现关系和粗糙集的微博话题检测方法.计算机系统应用,2016,25(6):17-24
LAN Tian,GUO Gong-De.News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence.COMPUTER SYSTEMS APPLICATIONS,2016,25(6):17-24
兰天,郭躬德.基于词共现关系和粗糙集的微博话题检测方法.计算机系统应用,2016,25(6):17-24
LAN Tian,GUO Gong-De.News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence.COMPUTER SYSTEMS APPLICATIONS,2016,25(6):17-24