###

计算机系统应用英文版:2017,26(2):207-211

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于多维度特征的不良网站检测

田双柱^1,2,3, 陈勇³, 延志伟³, 李晓东³

(1.中国科学院大学, 北京 100049;2.中国科学院计算机网络信息中心, 北京 100190;3.中国互联网络信息中心互联网络域名管理技术国家工程实验室, 北京 100190)

Illegitimate Website Detection Based on Multi-Dimensional Features

TIAN Shuang-Zhu^1,2,3, CHEN Yong³, YAN Zhi-Wei³, LI Xiao-Dong³

(1.University of Chinese Academy of Sciences, Beijing 100049, China;2.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;3.National Engineering Laboratory of Internet Domain Name Management Technology, China Internet Network Information Center, Beijing 100190, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1640次下载 2666次
Received:May 17, 2016 Revised:June 27, 2016

中文摘要: 目前主要是通过基于URL（Uniform Resource Locator）、关键词、图片等网页内容为特征的机器学习方法进行不良网站检测.但是，不良网站制作者也会通过更换URL，避免常见不良关键词的使用，对搜索爬虫隐藏图片等做法来规避检测，这使得基于内容的检测方法会有漏检的情况.为了更准确的检测出此类网站，本文提出了注册、解析方面的相关特征，并通过最主流的机器学习方法构建了检测模型.用模型预测新数据集，结果证明，基于解析和注册特征的检测方法可以有效的在网站集合中检测出前文提到的不良网站，并且对于一般不良也依然能够准确识别.本次研究为不良网站的检测研究提供了又一思路.

中文关键词: 解析注册不良网站检测

Abstract:The Web Information Extraction and Knowledge Presentation System is proposed to extract information from data intensive web pages. It downloads dynamic web pages, based on a knowledge database, changes them to XML documents after preprocessing, finds repeated patterns from them, by using a PAT-array based pattern discovery algorithm, recognizes their data display structure models, automatically based on the repeated patterns and an ontology-based keyword library, and then extracts the data and stores them in the knowledge database with the object-relational mapping technology of XML. Through these steps, web data is extracted automatically, and the knowledge database is also expanded automatically. Experiments on the traffic information auto-extraction and mixed traffic travel schemes auto-creation system showed that the system has high precision and is adaptive to web pages in different domains with different structures.

keywords: analysis registration illegitimate website detection

文章编号： 中图分类号： 文献标志码：

基金项目:

Author Name	Affiliation
TIAN Shuang-Zhu	University of Chinese Academy of Sciences, Beijing 100049, China Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China National Engineering Laboratory of Internet Domain Name Management Technology, China Internet Network Information Center, Beijing 100190, China
CHEN Yong	National Engineering Laboratory of Internet Domain Name Management Technology, China Internet Network Information Center, Beijing 100190, China
YAN Zhi-Wei	National Engineering Laboratory of Internet Domain Name Management Technology, China Internet Network Information Center, Beijing 100190, China
LI Xiao-Dong	National Engineering Laboratory of Internet Domain Name Management Technology, China Internet Network Information Center, Beijing 100190, China

引用文本：
田双柱,陈勇,延志伟,李晓东.基于多维度特征的不良网站检测.计算机系统应用,2017,26(2):207-211
TIAN Shuang-Zhu,CHEN Yong,YAN Zhi-Wei,LI Xiao-Dong.Illegitimate Website Detection Based on Multi-Dimensional Features.COMPUTER SYSTEMS APPLICATIONS,2017,26(2):207-211