Abstract:With the explosive growth of Internet information, researches on search engine and big data call for an efficient, stable and scalable crawler architecture to collect and analyze Internet data. Inspired by peer to peer network, we use distributed hash table as a carrier of communication between nodes, while a distributed hash table implementation-Kademlia protocol is modified and improved to meet the needs of the distributed crawler cluster's scalability and fault tolerance. In the experiments, we carried out multi-threaded experiment on single computer and node expansion experiment on distributed cluster. From system performance and system load point of view, the experimental results show the effectiveness of this kind of distributed cluster.