Abstract:The problem of lossy graph summarization is as follows: Given a graph G=(V, E) and a positive integer k, it is required to merge all nodes in graph G into k super nodes so that the resulting summary graph composed of these super nodes can represent the original graph G within a certain error range. As a combinatorial optimization problem based on graph partitioning, this problem is usually solved by randomly extracting node pairs successively and using heuristic methods to select node pairs for merging. This study proposes an effective two-stage algorithm, namely TS_LGS. The algorithm first sets the stage threshold according to the average degree of graph G. Specifically, in the first stage, the current number of super nodes is greater than the stage threshold, and the algorithm selects node pairs among the sampled node pairs in batch for merging based on the current best merging score, so as to effectively reduce the number of iterations; in the second stage, the algorithm preferentially selects adjacent node pairs based on weighted sampling, so as to merge the node pairs with small reconstruction error increment until the number of super nodes is k. The experimental results on several typical real network instances show that TS_LGS can extract graph summarization with lower reconstruction and query errors on the premise of lower time complexity compared with the existing best SAA algorithm.