###
DOI:
计算机系统应用英文版:2015,24(7):128-131
本文二维码信息
码上扫一扫!
Rabin指纹去重算法在搜索引擎中的应用
(四川文理学院 计算机学院, 达州 635000)
Application of Duplication Removal Method of Rabin Fingerprint in Search Engine
(College of Computer, Sichuan University of Arts and Science, Dazhou 635000, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1342次   下载 3488
Received:November 04, 2014    Revised:December 08, 2014
中文摘要: 针对搜索引擎在海量数据中搜索速度慢, 占用存储空间大, 对重复的网页去重性差的现状, 提出一种基于Rabin指纹算法的去重方法, 不仅对搜索到的URL地址进行去重, 还对非重复URL地址对应的网页内容进行相似和相同的去重, 试验表明能有效地提高搜索速度、节省存储空间, 增强搜索的精度.
中文关键词: Rabin指纹方法  搜索引擎  去重  URL  海量数据
Abstract:The existing search engine of massive data takes up large memory, needs much time and provides results of great duplication rate. To overcome these disadvantages, this paper proposes a duplication removal method based on the Rabin Fingerprint method, which cannot only remove the duplicated URL, but also remove the same even similar website content on different URL so that it can speed up the searching speed, save the memory capability and improve the accuracy of the research.
文章编号:     中图分类号:    文献标志码:
基金项目:国家档案局项目(2014-X-65)
引用文本:
贺建英.Rabin指纹去重算法在搜索引擎中的应用.计算机系统应用,2015,24(7):128-131
HE Jian-Ying.Application of Duplication Removal Method of Rabin Fingerprint in Search Engine.COMPUTER SYSTEMS APPLICATIONS,2015,24(7):128-131