Abstract:Aiming at solving problems of data redundancy and low query efficiency in the storage of mass social work data, this study proposed an effective partition-based neighbor sorting algorithm. The social data collected by different channels and stored in different storage methods were integrated to form a massive data set that can be stored in a two-dimensional form. The partitioning idea was used to segment the massive data set to clusters; the improved neighbor sorting algorithm was used for each cluster to obtain the final similar duplicate record detection results. The experimental and comparative analysis results show that the combination of partitioning and neighbor sorting algorithm not only improves the time efficiency of similar duplicate records detection of massive data, but also improves the detection accuracy.