Distributed Crawler Based on the Improved Kademlia Protocol
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the explosive growth of Internet information, researches on search engine and big data call for an efficient, stable and scalable crawler architecture to collect and analyze Internet data. Inspired by peer to peer network, we use distributed hash table as a carrier of communication between nodes, while a distributed hash table implementation-Kademlia protocol is modified and improved to meet the needs of the distributed crawler cluster's scalability and fault tolerance. In the experiments, we carried out multi-threaded experiment on single computer and node expansion experiment on distributed cluster. From system performance and system load point of view, the experimental results show the effectiveness of this kind of distributed cluster.

    Reference
    Related
    Cited by
Get Citation

陶耀东,向中希.基于改进Kademlia协议的分布式爬虫.计算机系统应用,2016,25(4):156-161

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 21,2015
  • Revised:September 14,2015
  • Adopted:
  • Online: April 19,2016
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063