###
DOI:
计算机系统应用英文版:2011,20(12):197-199
本文二维码信息
码上扫一扫!
基于净化网页的改进消重算法
(武汉理工大学 计算机科学与技术学院,武汉 430063)
Improved Duplicate Webpage's Elimination Algorithms Based on Purified Web Pages
(College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1480次   下载 2794
Received:April 08, 2011    Revised:May 22, 2011
中文摘要: 互联网的迅猛发展导致网络中的网页呈指数级别爆炸式增长。为解决在海量网页中寻找信息的问题,搜索引擎成为了人们使用互联网的重要工具。提出了一种基于净化网页的改进消重算法,并将它与传统的消重算法进行了比较。该算法结合关键字搜索和签名(计算指纹)搜索各自的优势来完成网页搜索消重。实验结果证明该方法对网页消重效果很好,提高了网页消重的查全率和查准率。
中文关键词: 网页消重  净化网页  关键字  签名
Abstract:The internet's development led to the rapid development on the explosive exponential growth level. To look for useful information, search engines have become one of the most important network tools. This paper presents an improved algorithm that is based on purified webpage and compared with the conventional algorithms. The algorithm combines the advantages of keyword search method and signature (calculated fingerprint) search method for the removal of duplicate pages. The experiments results certify that the algorithm improve the recall and precision.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
虞曼,熊前兴.基于净化网页的改进消重算法.计算机系统应用,2011,20(12):197-199
YU Man,XIONG Qian-Xing.Improved Duplicate Webpage's Elimination Algorithms Based on Purified Web Pages.COMPUTER SYSTEMS APPLICATIONS,2011,20(12):197-199