本文已被:浏览 627次 下载 1873次
Received:June 23, 2021 Revised:July 14, 2021
Received:June 23, 2021 Revised:July 14, 2021
中文摘要: 针对Spark检查点缓存数据清理需要等待作业运行完成后由编程人员清理, 可能导致产生失效数据累积占用内存问题, 本文分析检查点执行机制, 建模推导出随着检查点数量增多, 检查点缓存清理方法不可扩展, 提出使用检查点缓存效用熵模型感知检查点缓存和内存槽的匹配度, 并利用效用最佳匹配原则, 推导出最佳检查点缓存清理最佳时机. 基于效用熵的检查点缓存并行清理(PCC)策略, 通过使检查点缓存清理时刻近似等于检查点写入HDFS时刻优化内存资源. 实验结果表明, 在基于公平调度的多作业执行环境下, 随着检查点数量增加, 未优化程序执行效率变差, 使用PCC策略后, 在程序执行时长、耗电量、GC时间3个指标上最大分别能降低10.1%、9.5%、19.5% , 有效提升多检查点时的程序执行效率.
Abstract:In view of the fact that the cache data cleaning of spark checkpoint needs to be cleaned by the programmer after the job is completed, which may lead to memory accumulation of the failure data. This study analyzes the execution mechanism of checkpoint, deduces that the checkpoint cache cleaning method is not extensible with the increase of the number of check points. The matching degree between checkpoint cache and memory slot is measured by using the utility entropy model of checkpoint cache. The optimal checkpoint cache cleaning time is derived by using the principle of best utility matching. The PCC strategy based on utility entropy optimizes memory resources by making the checkpoint cache clean-up time approximately equal to the time when the checkpoint is written to HDFS. The experimental results show that in the multi-job execution environment based on fair scheduling, with the increase of the number of checkpoints, the execution efficiency of the unoptimized program becomes worse. After using PCC strategy, the program execution time, power consumption and GC time can be reduced by 10.1%, 9.5% and 19.5%, respectively. Effectively improve the efficiency of multi-checkpoint program execution.
文章编号: 中图分类号: 文献标志码:
基金项目:河南省科技研发项目(212102210078)
引用文本:
宋一鑫,于俊洋,何欣,王锦江.Spark效用感知的检查点缓存并行清理策略.计算机系统应用,2022,31(4):253-259
SONG Yi-Xin,YU Jun-Yang,HE Xin,WANG Jin-Jiang.Parallel Cleaning Strategy of Checkpoint Cache Based on Spark Utility Aware.COMPUTER SYSTEMS APPLICATIONS,2022,31(4):253-259
宋一鑫,于俊洋,何欣,王锦江.Spark效用感知的检查点缓存并行清理策略.计算机系统应用,2022,31(4):253-259
SONG Yi-Xin,YU Jun-Yang,HE Xin,WANG Jin-Jiang.Parallel Cleaning Strategy of Checkpoint Cache Based on Spark Utility Aware.COMPUTER SYSTEMS APPLICATIONS,2022,31(4):253-259