面向纠删码的高性能冗余转换机制
作者:
基金项目:

国家自然科学基金面上项目(62172382)


High Performance Redundancy Transitioning Scheme for Erasure Coding
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [30]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    分布式存储系统采用纠删码来实现高可靠和低开销的数据存储. 为了提供不同的可靠性和多样的访问性能, 存储系统需要对纠删码数据进行冗余转换, 即改变其编码参数. 条带合并机制为存储系统的冗余转换提供了一种思路. 然而, 基于传统纠删码的条带合并会在过程中引发大量的数据块重分布和校验块重计算I/O开销, 且在多次合并中会进一步加剧I/O. 针对此问题, 本文提出了一种新的树型里德-所罗门 (TRS)码, 通过分散数据块以消除数据块重分布I/O, 并通过设计编码矩阵以节约校验块重计算I/O. 树型里德-所罗门码进一步设计了存储单元, 将参与合并的条带组织成一棵树, 使得多次合并依据树结构自底向上高效完成. 本文设计实现了分布式存储原型系统. 实验表明, 树型里德-所罗门码相较于传统纠删码, 可以大大减少条带合并的完成时间.

    Abstract:

    Distributed storage systems achieve high-reliability and low-overhead data storage by erasure code. To provide different reliability and access performance, storage systems need to perform redundancy transitions on erasure code data by changing coding parameters. The stripe merging mechanism provides a way for redundancy transitioning in storage systems. However, the stripe merging process based on traditional erasure code can result in a large amount of data block redistribution and checksum block re-computation I/O overhead. Worst still, the I/O will be amplified in multiple merging operations. In response to these issues, this study proposes new Tree Reed-Solomon (TRS) codes that eliminate data block redistribution I/O by decentralizing data blocks, and save checksum block re-computation I/O by designing coding matrices. TRS codes further design storage units to organize the stripes taking part in merging into a tree, enabling multiple merging operations to be efficiently completed from bottom to top based on tree structure. To test the performance of TRS codes, this study designs and implements a distributed storage prototype. Experiments have shown that compared to other erasure codes, TRS codes can greatly reduce stripe merging operation time.

    参考文献
    [1] Huang C, Simitci H, Xu YK, et al. Erasure coding in Windows Azure storage. Proceedings of the 2012 USENIX Conference on Annual Technical Conference. Boston: USENIX Association, 2012. 15–26.
    [2] Ghemawat S, Gobioff H, Leung ST. The Google file system. Proceedings of the 19th ACM Symposium on Operating Systems Principles. Bolton: ACM, 2003. 29–43.
    [3] Rashmi KV, Shah NB, Gu DK, et al. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems. San Jose: USENIX Association, 2013. 8.
    [4] Yao QR, Hu YC, Cheng LF, et al. StripeMerge: Efficient wide-stripe generation for large-scale erasure-coded storage. Proceedings of the 41st IEEE International Conference on Distributed Computing Systems. Washington: IEEE, 2021. 483–493.
    [5] Ford D, Labelle F, Popovici FI, et al. Availability in globally distributed storage systems. Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation Vancouver: USENIX Association, 2010. 61–74.
    [6] Chen HB, Zhang H, Dong MK, et al. Efficient and available in-memory KV-store with hybrid erasure coding and replication. Proceedings of the 14th USENIX Conference on File and Storage Technologies. Santa Clara: USENIX Association, 2016. 167–180.
    [7] Yiu MMT, Chan HHW, Lee PPC. Erasure coding for small objects in in-memory KV storage. Proceedings of the 10th ACM International Systems and Storage Conference. Haifa: ACM, 2017. 14.
    [8] Weatherspoon H, Kubiatowicz JD. Erasure coding vs. replication: A quantitative comparison. Proceedings of the 1st International Workshop on Peer-to-peer Systems. Cambridge: Springer, 2002. 328–337.
    [9] Chiniah A, Mungur A. On the adoption of erasure code for cloud storage by major distributed storage systems. EAI Endorsed Transactions on Cloud Systems, 2022, 7(21): e1.
    [10] Qiao Y, Zhang MH, Zhou Y, et al. NetEC: Accelerating erasure coding reconstruction with in-network aggregation. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(10): 2571–2583.
    [11] Nachiappan R, Calheiros RN, Matawie KM, et al. Optimized proactive recovery in erasure-coded cloud storage systems. IEEE Access, 2023, 11: 38226–38239.
    [12] Cheng LF, Hu YC, Lee PPC. Coupling decentralized key-value stores with erasure coding. Proceedings of the 2019 ACM Symposium on Cloud Computing. Santa Cruz: ACM, 2019. 377–389.
    [13] Ren YJ, Ren YM, Li XL, et al. ELECT: Enabling erasure coding tiering for LSM-tree-based storage. Proceedings of the 22nd USENIX Conference on File and Storage Technologies. Santa Clara: USENIX Association, 2024. 18.
    [14] Hu YC, Zhang XY, Lee PPC, et al. NCScale: Toward optimal storage scaling via network coding. IEEE/ACM Transactions on Networking, 2022, 30(1): 271–284.
    [15] Hu YC, Cheng LF, Yao QR, et al. Exploiting combined locality for wide-stripe erasure coding in distributed storage. Proceedings of the 19th USENIX Conference on File and Storage Technologies. USENIX Association, 2021. 233–248.
    [16] Plank JS. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software: Practice and Experience, 1997, 27(9): 995–1012.
    [17] Plank JS, Ding Y. Note: Correction to the 1997 tutorial on Reed-Solomon coding. Software: Practice and Experience, 2005, 35(2): 189–194.
    [18] Xia MY, Saxena M, Blaum M, et al. A tale of two erasure codes in HDFS. Proceedings of the 13th USENIX Conference on File and Storage Technologies. Santa Clara: USENIX Association, 2015. 213–226.
    [19] Li J, Li BC. Demand-aware erasure coding for distributed storage systems. IEEE Transactions on Cloud Computing, 2021, 9(2): 532–545.
    [20] Yang JC, Yue Y, Rashmi KV. A large-scale analysis of hundreds of in-memory key-value cache clusters at twitter. ACM Transactions on Storage, 2021, 17(3): 17.
    [21] Taranov K, Alonso G, Hoefler T. Fast and strongly-consistent per-item resilience in key-value stores. Proceedings of the 13th EuroSys Conference. Porto: ACM, 2018. 39.
    [22] Chandrashekhara S, Kumar MR, Venkataramaiah M, et al. Cider: A case for block level variable redundancy on a distributed flash array. Proceedings of the 26th International Conference on Computer Communication and Networks. Vancouver: IEEE, 2017. 1–9.
    [23] Wu S, Du QP, Lee PPC, et al. Optimal data placement for stripe merging in locally repairable codes. Proceedings of the 2022 IEEE Conference on Computer Communications. London: IEEE, 2022. 1669–1678.
    [24] Wu S, Shen ZR, Lee PPC, et al. Elastic reed-solomon codes for efficient redundancy transitioning in distributed key-value stores. IEEE/ACM Transactions on Networking, 2024, 32(1): 670–685.
    [25] Xue HX, Wu CT, Li J, et al. Cauchy-merge: An efficient cauchy matrix based stripe merging method for Reed-Solomon codes. Proceedings of the 38th International Conference on Massive Storage Systems and Technology. Santa Clara, 2024.
    [26] Wu S, Lin GT, Lee PPC, et al. Optimal wide stripe generation in locally repairable codes via staged stripe merging. Proceedings of the 44th International Conference on Distributed Computing Systems. Jersey City: IEEE, 2024. 450–460.
    [27] Chen YP, Alspaugh S, Katz R. Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. Proceedings of the 2012 VLDB Endowment, 2012, 5(12): 1802–1813.
    [28] Abad CL, Roberts N, Lu Y, et al. A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns. Proceedings of the 2012 IEEE International Symposium on Workload Characterization. La Jolla: IEEE, 2012. 100–109.
    [29] Berg B, Berger DS, McAllister S, et al. The CacheLib caching engine: Design and experiences at scale. Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 2020. 753–768.
    [30] Memcached. https://memcached.org/blog. (2024-03-27)[2024-07-28].
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

柏志伟,吕敏.面向纠删码的高性能冗余转换机制.计算机系统应用,2025,34(2):111-121

复制
分享
文章指标
  • 点击次数:90
  • 下载次数: 386
  • HTML阅读次数: 70
  • 引用次数: 0
历史
  • 收稿日期:2024-06-28
  • 最后修改日期:2024-07-25
  • 在线发布日期: 2024-11-28
文章二维码
您是第11198266位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号