本文已被:浏览 1792次 下载 4412次
Received:March 02, 2017 Revised:March 23, 2017
Received:March 02, 2017 Revised:March 23, 2017
中文摘要: 为解决大规模短文本语料库主题模型参数K较大导致求解慢的问题,本文提出FBTM模型,将BTM中单个词对采样复杂度由O (K)降低O (1).针对短文本词语稀疏、描述能力弱的特点,提出一种结合同主题词对与FBTM的短文本分类算法,首先使用FBTM进行主题建模,将一段滑动窗口内的同主题词对作为特征扩充到原文本中,然后使用FBTM主题分布作为另一部分文本特征.对特征扩展后的Weibo语料库进行分类实验,结果显示该方法显著提高了分类性能.
中文关键词: 滑动窗口词对 快速双词主题模型(FBTM) 采样 特征扩展 短文本分类
Abstract:In order to solve the problem of large-scale short-text corpus topic model parameter K, the FBTM model is proposed to reduce the sampling complexity from O (K) to O (1). Aiming at the short spelling of short text and the weak description ability, this paper proposes a short text classification algorithm with biterm with the same topic and FBTM. Firstly, we use FBTM to model the text, and extend the same topic biterm in a sliding window as feature in the original text. Then, we use the FBTM topic distribution as another part of the text feature. The results show that this method has significantly improved the classification performance of Weibo corpus.
keywords: sliding window biterm fast biterm topic model sampling feature expension short text classification
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
刘泽锦,王洁.同主题词短文本分类算法中BTM的应用与改进.计算机系统应用,2017,26(11):213-219
LIU Ze-Jin,WANG Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic.COMPUTER SYSTEMS APPLICATIONS,2017,26(11):213-219
刘泽锦,王洁.同主题词短文本分类算法中BTM的应用与改进.计算机系统应用,2017,26(11):213-219
LIU Ze-Jin,WANG Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic.COMPUTER SYSTEMS APPLICATIONS,2017,26(11):213-219