###
计算机系统应用:2020,29(9):171-177
本文二维码信息
码上扫一扫!
结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别
(1.上海理工大学 光电信息与计算机工程学院, 上海 200093;2.复旦大学 上海市数据科学重点实验室, 上海 201203)
Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM
(1.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;2.Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 119次   下载 69
投稿时间:2019-12-12    修订日期:2020-01-03
中文摘要: 随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.
中文关键词: 垃圾短信  文本分类  self-attention  Bi-LSTM  TFIDF
Abstract:Mobile phone text messaging has become an increasingly important means of daily communication, so the identification of spam messages has importantly practical significance. A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose. The model first inputs the short message to the Bi-LSTM layer in a vector manner, after feature extraction and combining the information of TFIDF and self-attention layers, the final feature vector is obtained. Finally, the feature vector is classified by the Softmax classifier to obtain the classification result. The experimental results show, compared with the traditional classification model, the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%, and the running time is reduced by 0.6 s–10.2 s.
文章编号:7495     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61472256,61170277,61003031);上海重点科技攻关项目(14511107902);上海市工程中心建设项目(GCZXL14014);上海市一流学科建设项目(S1201YLXK,XTKX2021.);上海市数据科学重点实验室开发课题(201609060003);沪江基金(A14006);沪江基金研究基地专项(C14001)
引用文本:
吴思慧,陈世平.结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别.计算机系统应用,2020,29(9):171-177
WU Si-Hui,CHEN Shi-Ping.Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM.COMPUTER SYSTEMS APPLICATIONS,2020,29(9):171-177

用微信扫一扫

用微信扫一扫