基于多神经网络混合的短文本分类模型

doi:10.15888/j.cnki.csa.007493

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年5月10日 17:45 星期六

首页 > 过刊浏览>2020年第29卷第10期 >9-19. DOI:10.15888/j.cnki.csa.007493

PDF HTML阅读 XML下载导出引用引用提醒

基于多神经网络混合的短文本分类模型
DOI:
                        10.15888/j.cnki.csa.007493
                    
CSTR:
                        
                    
作者:
                        侯雪亮侯雪亮
中国科学院 计算机网络信息中心, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
李新李新
中国科学院 计算机网络信息中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
陈远平陈远平
中国科学院 计算机网络信息中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:中国科学院信息化建设专项（XXH13504-01）

Short Text Classification Model Based on Multi-Neural Network Hybrid

Author:

HOU Xue-Liang
HOU Xue-Liang
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
LI Xin
LI Xin
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Yuan-Ping
CHEN Yuan-Ping
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [20]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

文本分类指的是在制定文本的类别体系下，让计算机学会通过某种分类算法将待分类的内容完成分类的过程.与文本分类有关的算法已经被应用到了网页分类、数字图书馆、新闻推荐等领域.本文针对短文本分类任务的特点，提出了基于多神经网络混合的短文本分类模型（Hybrid Short Text Classical Model Base on Multi-neural Networks）.通过对短文本内容的关键词提取进行重构文本特征，并作为多神经网络模型的输入进行类别向量的融合，从而兼顾了FastText模型和TextCNN模型的特点.实验结果表明，相对于目前流行的文本分类算法而言，多神经网络混合的短本文分类模型在精确率、召回率和F1分数等多项指标上展现出了更加优越的算法性能.

关键词:深度学习;短文本分类;关键词提取;特征重构;神经网络;FastText;TextCNN

Abstract:

Text classification refers to the process of letting a computer learn to complete the classification of content by some classification algorithm under the classification system of text. Algorithms related to text classification have been applied to web classification, digital libraries, news recommendation, and other fields. Based on the characteristics of short text classification tasks, this study proposes a hybrid short text classical model based on multi-neural networks. By reconstructing the text features of the keywords extracted from the short text content, and using the vector fusion as the input of the multi-neural network model, the characteristics of the FastText model and the TextCNN model are taken into account. The experimental results show that compared with the current popular text classification algorithms, the multi-neural network hybrid short text classification model shows more superior algorithm performance on multiple indicators such as accuracy, recall, and F1 score.

Key words:deep learning;short text classification;keyword extraction;feature reconstruction;neural network;FastText;TextCNN

参考文献

[1] Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620. [doi: 10.1145/361219.361220

[2] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain. 2017. 427-431.

[3] 常耀成, 张宇翔, 王红, 等. 特征驱动的关键词提取算法综述. 软件学报, 2018, 29(7): 2046-2070. [doi: 10.13328/j.cnki.jos.005538

[4] 康卫, 邱红哲, 焦冬冬, 等. 基于搜索的短文本分类算法研究. 电子技术应用, 2018, 44(11): 121-123, 128

[5] McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. Proceedings of AAAI-98 Workshop on Learning for Text Categorization. 1998. 41-48.

[6] 孙启干. 面向Web文本检索的归一化向量分类算法[硕士学位论文]. 重庆: 重庆大学, 2012.

[7] Lin YS, Jiang JY, Lee SJ. A similarity measure for text classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(7): 1575-1590. [doi: 10.1109/TKDE.2013.19

[8] 崔伟东, 周志华, 李星. 支持向量机研究. 计算机工程与应用, 2001, 27(1): 58-61. [doi: 10.3321/j.issn:1002-8331.2001.01.019

[9] Tan SB. An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 2006, 30(2): 290-298. [doi: 10.1016/j.eswa.2005.07.019

[10] Rose S, Engel D, Cramer N, et al. Automatic keyword extraction from individual documents. In: Berry MW, Kogan J., eds. Text Mining: Applications and Theory. New Jersey: John Wiley & Sons, Ltd, 2010. 1-20.

[11] Mihalcea R, Tarau P. Textrank: Bringing order into text. Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain. 2004. 404-411.

[12] 何金金, 郭振波, 王开西. 基于TextRank的网评产品特征提取方法. 自然科学版, 2018, 31(1): 109-114

[13] Hauke J, Kossowski T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 2011, 30(2): 87-93. [doi: 10.2478/v10117-011-0021-1

[14] 张备. 基于多神经网络的混合动态推荐研究[硕士学位论文]. 重庆: 重庆大学, 2017.

[15] 古倩. 基于特征向量构建的文本分类方法研究[硕士学位论文]. 西安: 西安理工大学, 2019.

[16] 冯勇, 屈渤浩, 徐红艳, 等. 融合TF-IDF和LDA的中文FastText短文本分类方法. 应用科学学报, 2019, 37(3): 378-388. [doi: 10.3969/j.issn.0255-8297.2019.03.008

[17] Kim Y. Convolutional neural networks for sentence classification. Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar. 2014. 1746-1751.

[18] 刘全, 梁斌, 徐进, 等. 一种用于基于方面情感分析的深度分层网络模型. 计算机学报, 2018, 41(12): 2637-2652. [doi: 10.11897/SP.J.1016.2018.02637

[19] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15: 1929-1958

[20] 孙茂松, 李景阳, 郭志芃, 等. THUCTC: 一个高效的中文文本分类工具包. 2016.

引用本文

侯雪亮,李新,陈远平.基于多神经网络混合的短文本分类模型.计算机系统应用,2020,29(10):9-19

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-12-05
最后修改日期:2020-01-03
录用日期:
在线发布日期: 2020-09-30
出版日期: 2020-10-15

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码