基于张量分解的分布式主题分类模型

doi:10.15888/j.cnki.csa.006394

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 5:04 星期四

首页 > 过刊浏览>2018年第27卷第6期 >151-157. DOI:10.15888/j.cnki.csa.006394

PDF HTML阅读 XML下载导出引用引用提醒

基于张量分解的分布式主题分类模型
DOI:
                        10.15888/j.cnki.csa.006394
                    
CSTR:
                        
                    
作者:
                        马年圣马年圣
河海大学 商学院, 南京 211100
在期刊界中查找
在百度中查找
在本站中查找
卞艺杰卞艺杰
河海大学 商学院, 南京 211100
在期刊界中查找
在百度中查找
在本站中查找
唐明伟唐明伟
南京审计大学 管理科学与工程学院, 南京 211815
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金青年项目（71603114）；江苏省社会科学基金青年项目（16TQC004）；中国博士后基金面上项目（2015M581776）

Improved Distributed Topic Classification Model Based on Tensor Decomposition

Author:

MA Nian-Sheng
MA Nian-Sheng
Business School, Hohai University, Nanjing 211100, China
在期刊界中查找
在百度中查找
在本站中查找
BIAN Yi-Jie
BIAN Yi-Jie
Business School, Hohai University, Nanjing 211100, China
在期刊界中查找
在百度中查找
在本站中查找
TANG Ming-Wei
TANG Ming-Wei
School of Management Science and Engineering, Nanjing Audit University, Nanjing 211815, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [19]

相似文献

引证文献

资源附件

文章评论

摘要:

针对大规模数据分类时计算时间长以及分类精度下降等问题，提出使用张量分解求解LDA主题模型参数，实现对海量网络数据的采集、分类、挖掘.该方法使用矩量法将LDA模型求解转化为低维的张量分解问题，通过分解和反射进行参数的传递，运用大数据平台Spark的进行分布式计算.实验结果表明，改进的模型参数计算方法在时间效率和困惑度方面都得到了提升，并且分类信息更加直观，更加适用于大规模网络数据分类工作.

关键词:LDA主题模型;张量分解;Spark;数据分类

Abstract:

Aiming at the problems of large computation time and low classification time, this study presents an improved parameter estimation model for LDA by using the method of tensor decomposition, which can collect, classify, and mine massive network data. Using the method of moments, the LDA model calculation is transformed into low-dimensional tensor decomposition, and the parameters are transferred by decomposition and reflection. The large data platform Spark is used for distributed computation. The experimental results show that the model has been improved in terms of running time and perplexity, and the classification information display is more intuitive, which is more suitable for large-scale network data classification.

Key words:LDA theme model;tensor decomposition;Spark;data classification

参考文献

[1] Hoffman MD, Blei DM, Wang C, et al. Stochastic variational inference. Journal of Machine Learning Research, 2013, 14(5):1303-1347.

[2] Nallapati R, Cohen W, Lafferty J. Parallelized variational em for latent dirichlet allocation:An experimental evaluation of speed and scalability. Proceedings of 2007 Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007). Omaha, NE, USA. 2007. 349-354.

[3] Griffiths TL, Steyvers M. Finding scientific topics. Proc. of the National Academy of Sciences of the United States of America, 2004, 101(S1):5228-5235.

[4] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘. 图书情报工作, 2014, 58(5):58-63.[DOI:10.11925/infotech.1003-3513.2014.05.08]

[5] Ramage D, Hall D, Nallapati R, et al. Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora. Proceedings of 2009 Conference on Empirical Methods in Natural Language Processing. Singapore. 2009. 248-256.

[6] 桂思思, 陆伟, 黄诗豪, 等. 融合主题模型及多时间节点函数的用户兴趣预测研究. 现代图书情报技术, 2015, (9):9-16.[DOI:10.11925/infotech.1003-3513.2015.09.02]

[7] 关鹏, 王曰芬. 基于LDA主题模型和生命周期理论的科学文献主题挖掘. 情报学报, 2015, 34(3):286-299.

[8] Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3(4-5):993-1022.

[9] 李湘东, 胡逸泉, 黄莉. 采用LDA主题模型的多种类型文献混合自动分类研究. 图书馆论坛, 2015, 35(1):74-80.

[10] Sidiropoulos ND, Bro R. On the uniqueness of multilinear decomposition of N-way arrays. Journal of Chemometrics, 2000, 14:229-239.[DOI:10.1002/(ISSN)1099-128X]

[11] Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Review, 2009, 51(3):455-500.[DOI:10. 1137/07070111X]

[12] Anandkumar A, Foster DP, Hsu D, et al. A spectral algorithm for latent dirichlet allocation. Algorithmica, 2015, 72(1):193-214.[DOI:10.1007/s00453-014-9909-1]

[13] Halko N, Martinsson PG, Tropp JA. Finding structure with randomness:Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2010, 53(2):217-288.

[14] Anandkumar A, Ge R, Hsu D, et al. Tensor decompositions for learning latent variable models. The Journal of Machine Learning Research, 2014, 15(1):2773-2832.

[15] Liu SZ, Trenkler G. Hadamard, khatri-rao, kronecker and other matrix products. International Journal of Information and Systems Sciences, 2008, 4(1):160-177.

[16] Valiant LG. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8):103-111.[DOI:10.1145/79173.79181]

[17] Wang YN, Tung HY, Smola A J, et al. Fast and guaranteed tensor decomposition via sketching. Proceedings of 2015 Advances in Neural Information Processing Systems (NIPS). Montreal, Canada. 2015. 991-999.

[18] Macausland R. The moore-penrose inverse and least squares[Thesis]. Tacoma, Washington, USA:University of Puget Sound, 2014.

[19] 冯永, 李华, 钟将, 等. 基于自适应中文分词和近似SVM的文本分类算法. 计算机科学, 2010, 37(1):251-254, 293.

引用本文

马年圣,卞艺杰,唐明伟.基于张量分解的分布式主题分类模型.计算机系统应用,2018,27(6):151-157

复制

文章指标

点击次数:1983
下载次数: 2668
HTML阅读次数: 2139
引用次数: 0

历史

收稿日期:2017-10-09
最后修改日期:2017-11-01
录用日期:
在线发布日期: 2018-05-29
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码