###

计算机系统应用英文版:2020,29(11):11-20

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于半监督学习的恶意URL检测方法

麻瓯勃, 刘雪娇, 唐旭栋, 周宇轩, 胡亦承

(杭州师范大学杭州国际服务工程学院, 杭州 311121)

Malicious URL Detection Based on Semi-Supervised Learning

MA Ou-Bo, LIU Xue-Jiao, TANG Xu-Dong, ZHOU Yu-Xuan, HU Yi-Cheng

(Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1075次下载 2997次
Received:November 18, 2019 Revised:December 11, 2019

中文摘要: 检测恶意URL对防御网络攻击有着重要意义. 针对有监督学习需要大量有标签样本这一问题, 本文采用半监督学习方式训练恶意URL检测模型, 减少了为数据打标签带来的成本开销. 在传统半监督学习协同训练(co-training)的基础上进行了算法改进, 利用专家知识与Doc2Vec两种方法预处理的数据训练两个分类器, 筛选两个分类器预测结果相同且置信度高的数据打上伪标签(pseudo-labeled)后用于分类器继续学习. 实验结果表明, 本文方法只用0.67%的有标签数据即可训练出检测精确度(precision)分别达到99.42%和95.23%的两个不同类型分类器, 与有监督学习性能相近, 比自训练与协同训练表现更优异.

中文关键词: 恶意URL检测半监督学习协同训练改进算法 Doc2Vec 分类器训练

Abstract:Detecting malicious URL is important for defending against cyber attacks. In view of the problem that supervised learning requires a large number of labeled samples, this study uses a semi-supervised learning method to train malicious URL detection models, which reduces the cost overhead of labeling data. We propose an improved algorithm based on the traditional co-training. Two kinds of classifiers are trained by using expert knowledge and Doc2Vec pre-processed data, and the data with the same prediction result and the high confidence of the two classifiers are screened and used for classifiers learning after being pseudo-labeled. The experimental results show that the proposed method can train two different types of classifiers with detection precision of 99.42% and 95.23% with only 0.67% of labeled data, which is similar to supervised learning performance and performs better than self-training and co-training.

keywords: malicious URL detection semi-supervised learning co-training improvement algorithm Doc2Vec classifier training

文章编号： 中图分类号： 文献标志码：

基金项目:浙江省自然科学基金(LY19F020021); 浙江省大学生科技创新活动计划(新苗人才计划) (2019R426035)

引用文本：
麻瓯勃,刘雪娇,唐旭栋,周宇轩,胡亦承.基于半监督学习的恶意URL检测方法.计算机系统应用,2020,29(11):11-20
MA Ou-Bo,LIU Xue-Jiao,TANG Xu-Dong,ZHOU Yu-Xuan,HU Yi-Cheng.Malicious URL Detection Based on Semi-Supervised Learning.COMPUTER SYSTEMS APPLICATIONS,2020,29(11):11-20

Author Name	Affiliation	E-mail
MA Ou-Bo	Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China
LIU Xue-Jiao	Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China	liuxuejiao0406@163.com
TANG Xu-Dong	Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China
ZHOU Yu-Xuan	Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China
HU Yi-Cheng	Hangzhou Institute of Service Engineering, Hangzhou Normal University, Hangzhou 311121, China