本文已被:浏览 1632次 下载 3215次
Received:April 08, 2011 Revised:May 22, 2011
Received:April 08, 2011 Revised:May 22, 2011
中文摘要: 互联网的迅猛发展导致网络中的网页呈指数级别爆炸式增长。为解决在海量网页中寻找信息的问题,搜索引擎成为了人们使用互联网的重要工具。提出了一种基于净化网页的改进消重算法,并将它与传统的消重算法进行了比较。该算法结合关键字搜索和签名(计算指纹)搜索各自的优势来完成网页搜索消重。实验结果证明该方法对网页消重效果很好,提高了网页消重的查全率和查准率。
Abstract:The internet's development led to the rapid development on the explosive exponential growth level. To look for useful information, search engines have become one of the most important network tools. This paper presents an improved algorithm that is based on purified webpage and compared with the conventional algorithms. The algorithm combines the advantages of keyword search method and signature (calculated fingerprint) search method for the removal of duplicate pages. The experiments results certify that the algorithm improve the recall and precision.
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
虞曼,熊前兴.基于净化网页的改进消重算法.计算机系统应用,2011,20(12):197-199
YU Man,XIONG Qian-Xing.Improved Duplicate Webpage's Elimination Algorithms Based on Purified Web Pages.COMPUTER SYSTEMS APPLICATIONS,2011,20(12):197-199
虞曼,熊前兴.基于净化网页的改进消重算法.计算机系统应用,2011,20(12):197-199
YU Man,XIONG Qian-Xing.Improved Duplicate Webpage's Elimination Algorithms Based on Purified Web Pages.COMPUTER SYSTEMS APPLICATIONS,2011,20(12):197-199