The existing search engine of massive data takes up large memory, needs much time and provides results of great duplication rate. To overcome these disadvantages, this paper proposes a duplication removal method based on the Rabin Fingerprint method, which cannot only remove the duplicated URL, but also remove the same even similar website content on different URL so that it can speed up the searching speed, save the memory capability and improve the accuracy of the research.