Two-stage Algorithm for Lossy Graph Summarization
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The problem of lossy graph summarization is as follows: Given a graph G=(V, E) and a positive integer k, it is required to merge all nodes in graph G into k super nodes so that the resulting summary graph composed of these super nodes can represent the original graph G within a certain error range. As a combinatorial optimization problem based on graph partitioning, this problem is usually solved by randomly extracting node pairs successively and using heuristic methods to select node pairs for merging. This study proposes an effective two-stage algorithm, namely TS_LGS. The algorithm first sets the stage threshold according to the average degree of graph G. Specifically, in the first stage, the current number of super nodes is greater than the stage threshold, and the algorithm selects node pairs among the sampled node pairs in batch for merging based on the current best merging score, so as to effectively reduce the number of iterations; in the second stage, the algorithm preferentially selects adjacent node pairs based on weighted sampling, so as to merge the node pairs with small reconstruction error increment until the number of super nodes is k. The experimental results on several typical real network instances show that TS_LGS can extract graph summarization with lower reconstruction and query errors on the premise of lower time complexity compared with the existing best SAA algorithm.

    Reference
    [1] Tian YY, Hankins RA, Patel JM. Efficient aggregation for graph summarization. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. Vancouver: ACM, 2008. 567–580.
    [2] Maserrat H, Pei J. Community preserving lossy compression of social networks. Proceedings of the 12th IEEE International Conference on Data Mining. Brussels: IEEE, 2012. 509–518.
    [3] Shin K, Ghoting A, Kim M, et al. SWeG: Lossless and lossy summarization of Web-scale graphs. Proceedings of the World Wide Web Conference. San Francisco: ACM, 2019. 1679–1690.
    [4] Bonifati A, Dumbrava S, Kondylakis H. Graph summarization. arXiv. 2004.14794, 2020.
    [5] 王鹤. 基于图摘要的图模式挖掘研究与实现[硕士学位论文]. 南京: 东南大学, 2018.
    [6] Rissanen J. Modeling by shortest data description. Automatica, 1978, 14(5): 465–471. [doi: 10.1016/0005-1098(78)90005-5
    [7] Lelewer DA, Hirschberg DS. Data compression. ACM Computing Surveys, 1987, 19(3): 261–296. [doi: 10.1145/45072.45074
    [8] Navlakha S, Rastogi R, Shrivastava N. Graph summarization with bounded error. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. Vancouver: ACM, 2008. 419–432.
    [9] Ko J, Kook Y, Shin K. Incremental lossless graph summarization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2020. 317–327.
    [10] Koutra D, Kang U, Vreeken J, et al. VOG: Summarizing and understanding large graphs. Proceedings of the 2014 SIAM International Conference on Data Mining. Philadelphia: Society for Industrial and Applied Mathematics, 2014. 91–99.
    [11] LeFevre K, Terzi E. GraSS: Graph structure summarization. Proceedings of the 2010 SIAM International Conference on Data Mining. Columbus: Society for Industrial and Applied Mathematics. 2010. 454–465.
    [12] Beg MA, Ahmad M, Zaman A, et al. Scalable approximation algorithm for graph summarization. Proceedings of the 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Melbourne: Springer, 2018. 502–514.
    [13] Riondato M, García-Soriano D, Bonchi F. Graph summarization with quality guarantees. Data Mining and Knowledge Discovery, 2017, 31(2): 314–349. [doi: 10.1007/s10618-016-0468-8
    [14] Lee K, Jo H, Ko J, et al. SSumM: Sparse summarization of massive graphs. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2020. 144–154.
    [15] Zhou HQ, Liu SH, Lee K, et al. DPGS: Degree-preserving graph summarization. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics. Waltham, 2021. 280–288.
    [16] Fan WF, Li JZ, Wang X, et al. Query preserving graph compression. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. Scottsdale: ACM, 2012. 157–168.
    [17] Zhuang H, Rahman R, Hu X, et al. Data summarization with social contexts. Proceedings of the 25th ACM International Conference on Information and Knowledge Management. Indianapolis: ACM, 2016. 397–406.
    [18] Chierichetti F, Kumar R, Lattanzi S, et al. On compressing social networks. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris: ACM, 2009. 219–228.
    [19] Safavi T, Belth C, Faber L, et al. Personalized knowledge graph summarization: From the cloud to your pocket. Proceedings of the 2019 IEEE International Conference on Data Mining. Beijing: IEEE, 2019. 528–537.
    [20] 王健. 基于张量分解的高效图摘要算法研究[硕士学位论文]. 武汉: 华中科技大学, 2021.
    [21] McNeil M, Ma BY, Bogdanov P. SAGA: Signal-aware graph aggregation. Proceedings of the 2022 SIAM International Conference on Data Mining. Alexandria: Society for Industrial and Applied Mathematics. 2022. 136–144.
    [22] Shin K. Mining large dynamic graphs and tensors [Ph.D. Thesis]. Pittsburgh: Carnegie Mellon University, 2019.
    [23] 徐正祥, 王英, 汪洪吉, 等. 基于特征加强的异构网络潜在摘要模型. 计算机科学与探索, 2022, 16(11): 2537–2546. [doi: 10.3778/j.issn.1673-9418.2104081
    [24] Liu YK, Safavi T, Dighe A, et al. Graph summarization methods and applications: A survey. ACM Computing Surveys, 2019, 51(3): 62
    [25] Ahmadi N, Sand H, Papotti P. Unsupervised matching of data and text. Proceedings of the 38th IEEE International Conference on Data Engineering. Kuala Lumpur: IEEE, 2022. 1058–1070.
    [26] Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 2005, 55(1): 58–75. [doi: 10.1016/j.jalgor.2003.12.001
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

冯康,陈卫东.图的有损摘要问题的两阶段算法.计算机系统应用,2023,32(6):189-196

Copy
Share
Article Metrics
  • Abstract:549
  • PDF: 1241
  • HTML: 818
  • Cited by: 0
History
  • Received:December 05,2022
  • Revised:January 06,2023
  • Online: April 07,2023
Article QR Code
You are the first990364Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063