###
计算机系统应用英文版:2021,30(6):141-147
本文二维码信息
码上扫一扫!
基于语义增强的短文本主题模型
(西安工程大学 计算机科学学院, 西安 710600)
Short Text Topic Model Based on Semantic Enhancement
(School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 999次   下载 1788
Received:October 05, 2020    Revised:November 02, 2020
中文摘要: 传统主题模型方法很大程度上依赖于词共现模式生成文档主题, 短文本由于缺乏足够的上下文信息导致的数据稀疏性成为传统主题模型在短文本上取得良好效果的瓶颈. 基于此, 本文提出一种基于语义增强的短文本主题模型, 算法将DMM (Dirichlet Multinomial Mixture)与词嵌入模型相结合, 通过训练全局词嵌入与局部词嵌入获得词的向量表示, 融合全局词嵌入向量与局部词嵌入向量计算词向量间的语义相关度, 并通过主题相关词权重进行词的语义增强计算. 实验表明, 本文提出的模型在主题一致性表示上更准确, 且提升了模型在短文本上的分类正确率.
Abstract:Traditional topic models rely largely on word co-occurrence patterns to generate text topics. The data sparseness of short texts due to insufficient context has restrained traditional topic models from achieving good results with regard to short texts. On this basis, this study proposes a short text topic model based on semantic enhancement. The algorithm integrates the Dirichlet Multinomial Mixture (DMM) model with a word embedding model. It obtains the vector representation of words by training global word embedding and local word embedding and calculates the semantic correlation between word vectors with cosine similarity. Besides, it enhances the semantic meaning of words by calculating the weight of topic-related words. Experiments demonstrate the proposed model is more accurate in consistence of topic representation and improves the classification accuracy of the model in regard to short texts.
文章编号:     中图分类号:    文献标志码:
基金项目:陕西省自然科学基金(2019JQ-849); 柯桥纺织产业创新项目(19KQYB23)
引用文本:
高娟,张晓滨.基于语义增强的短文本主题模型.计算机系统应用,2021,30(6):141-147
GAO Juan,ZHANG Xiao-Bin.Short Text Topic Model Based on Semantic Enhancement.COMPUTER SYSTEMS APPLICATIONS,2021,30(6):141-147