###

计算机系统应用英文版:2018,27(6):151-157

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于张量分解的分布式主题分类模型

马年圣¹, 卞艺杰¹, 唐明伟²

(1.河海大学商学院, 南京 211100;2.南京审计大学管理科学与工程学院, 南京 211815)

Improved Distributed Topic Classification Model Based on Tensor Decomposition

MA Nian-Sheng¹, BIAN Yi-Jie¹, TANG Ming-Wei²

(1.Business School, Hohai University, Nanjing 211100, China;2.School of Management Science and Engineering, Nanjing Audit University, Nanjing 211815, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1901次下载 2512次
Received:October 09, 2017 Revised:November 01, 2017

中文摘要: 针对大规模数据分类时计算时间长以及分类精度下降等问题，提出使用张量分解求解LDA主题模型参数，实现对海量网络数据的采集、分类、挖掘.该方法使用矩量法将LDA模型求解转化为低维的张量分解问题，通过分解和反射进行参数的传递，运用大数据平台Spark的进行分布式计算.实验结果表明，改进的模型参数计算方法在时间效率和困惑度方面都得到了提升，并且分类信息更加直观，更加适用于大规模网络数据分类工作.

中文关键词: LDA主题模型张量分解 Spark 数据分类

Abstract:Aiming at the problems of large computation time and low classification time, this study presents an improved parameter estimation model for LDA by using the method of tensor decomposition, which can collect, classify, and mine massive network data. Using the method of moments, the LDA model calculation is transformed into low-dimensional tensor decomposition, and the parameters are transferred by decomposition and reflection. The large data platform Spark is used for distributed computation. The experimental results show that the model has been improved in terms of running time and perplexity, and the classification information display is more intuitive, which is more suitable for large-scale network data classification.

keywords: LDA theme model tensor decomposition Spark data classification

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金青年项目（71603114）；江苏省社会科学基金青年项目（16TQC004）；中国博士后基金面上项目（2015M581776）

Author Name	Affiliation	E-mail
MA Nian-Sheng	Business School, Hohai University, Nanjing 211100, China	MacMargo@163.com
BIAN Yi-Jie	Business School, Hohai University, Nanjing 211100, China
TANG Ming-Wei	School of Management Science and Engineering, Nanjing Audit University, Nanjing 211815, China

Author Name	Affiliation	E-mail
MA Nian-Sheng	Business School, Hohai University, Nanjing 211100, China	MacMargo@163.com
BIAN Yi-Jie	Business School, Hohai University, Nanjing 211100, China
TANG Ming-Wei	School of Management Science and Engineering, Nanjing Audit University, Nanjing 211815, China

引用文本：
马年圣,卞艺杰,唐明伟.基于张量分解的分布式主题分类模型.计算机系统应用,2018,27(6):151-157
MA Nian-Sheng,BIAN Yi-Jie,TANG Ming-Wei.Improved Distributed Topic Classification Model Based on Tensor Decomposition.COMPUTER SYSTEMS APPLICATIONS,2018,27(6):151-157