###

计算机系统应用英文版:2020,29(9):171-177

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别

吴思慧¹, 陈世平²

(1.上海理工大学光电信息与计算机工程学院, 上海 200093;2.复旦大学上海市数据科学重点实验室, 上海 201203)

Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM

WU Si-Hui¹, CHEN Shi-Ping²

(1.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;2.Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1149次下载 2893次
Received:December 12, 2019 Revised:January 03, 2020

中文摘要: 随着手机短信成为人们日常生活交往的重要手段，垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层，经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量，最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明，结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%，运行时间减少了0.6 s–10.2 s.

中文关键词: 垃圾短信文本分类 self-attention Bi-LSTM TFIDF

Abstract:Mobile phone text messaging has become an increasingly important means of daily communication, so the identification of spam messages has importantly practical significance. A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose. The model first inputs the short message to the Bi-LSTM layer in a vector manner, after feature extraction and combining the information of TFIDF and self-attention layers, the final feature vector is obtained. Finally, the feature vector is classified by the Softmax classifier to obtain the classification result. The experimental results show, compared with the traditional classification model, the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%, and the running time is reduced by 0.6 s–10.2 s.

keywords: spam message text categorization self-attention Bi-LSTM TFIDF

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金（61472256，61170277，61003031）；上海重点科技攻关项目（14511107902）；上海市工程中心建设项目（GCZXL14014）；上海市一流学科建设项目（S1201YLXK，XTKX2021.）；上海市数据科学重点实验室开发课题（201609060003）；沪江基金（A14006）；沪江基金研究基地专项（C14001）

Author Name	Affiliation	E-mail
WU Si-Hui	School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China	19916546846@163.com
CHEN Shi-Ping	Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China

Author Name	Affiliation	E-mail
WU Si-Hui	School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China	19916546846@163.com
CHEN Shi-Ping	Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China

引用文本：
吴思慧,陈世平.结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别.计算机系统应用,2020,29(9):171-177
WU Si-Hui,CHEN Shi-Ping.Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM.COMPUTER SYSTEMS APPLICATIONS,2020,29(9):171-177