本文已被:浏览 1149次 下载 2893次
Received:December 12, 2019 Revised:January 03, 2020
Received:December 12, 2019 Revised:January 03, 2020
中文摘要: 随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.
中文关键词: 垃圾短信 文本分类 self-attention Bi-LSTM TFIDF
Abstract:Mobile phone text messaging has become an increasingly important means of daily communication, so the identification of spam messages has importantly practical significance. A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose. The model first inputs the short message to the Bi-LSTM layer in a vector manner, after feature extraction and combining the information of TFIDF and self-attention layers, the final feature vector is obtained. Finally, the feature vector is classified by the Softmax classifier to obtain the classification result. The experimental results show, compared with the traditional classification model, the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%, and the running time is reduced by 0.6 s–10.2 s.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61472256,61170277,61003031);上海重点科技攻关项目(14511107902);上海市工程中心建设项目(GCZXL14014);上海市一流学科建设项目(S1201YLXK,XTKX2021.);上海市数据科学重点实验室开发课题(201609060003);沪江基金(A14006);沪江基金研究基地专项(C14001)
引用文本:
吴思慧,陈世平.结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别.计算机系统应用,2020,29(9):171-177
WU Si-Hui,CHEN Shi-Ping.Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM.COMPUTER SYSTEMS APPLICATIONS,2020,29(9):171-177
吴思慧,陈世平.结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别.计算机系统应用,2020,29(9):171-177
WU Si-Hui,CHEN Shi-Ping.Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM.COMPUTER SYSTEMS APPLICATIONS,2020,29(9):171-177