###
计算机系统应用英文版:2017,26(11):213-219
本文二维码信息
码上扫一扫!
同主题词短文本分类算法中BTM的应用与改进
(北京工业大学 信息学部 多媒体与智能软件技术北京市重点实验室, 北京 100124)
Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic
(Beijing Municipal Key Laboratory of Multimedia and Intelligent Software, Faculty of Information, Beijing University of Technology, Beijing 100124, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1792次   下载 4412
Received:March 02, 2017    Revised:March 23, 2017
中文摘要: 为解决大规模短文本语料库主题模型参数K较大导致求解慢的问题,本文提出FBTM模型,将BTM中单个词对采样复杂度由O (K)降低O (1).针对短文本词语稀疏、描述能力弱的特点,提出一种结合同主题词对与FBTM的短文本分类算法,首先使用FBTM进行主题建模,将一段滑动窗口内的同主题词对作为特征扩充到原文本中,然后使用FBTM主题分布作为另一部分文本特征.对特征扩展后的Weibo语料库进行分类实验,结果显示该方法显著提高了分类性能.
Abstract:In order to solve the problem of large-scale short-text corpus topic model parameter K, the FBTM model is proposed to reduce the sampling complexity from O (K) to O (1). Aiming at the short spelling of short text and the weak description ability, this paper proposes a short text classification algorithm with biterm with the same topic and FBTM. Firstly, we use FBTM to model the text, and extend the same topic biterm in a sliding window as feature in the original text. Then, we use the FBTM topic distribution as another part of the text feature. The results show that this method has significantly improved the classification performance of Weibo corpus.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
刘泽锦,王洁.同主题词短文本分类算法中BTM的应用与改进.计算机系统应用,2017,26(11):213-219
LIU Ze-Jin,WANG Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic.COMPUTER SYSTEMS APPLICATIONS,2017,26(11):213-219