本文已被:浏览 645次 下载 1772次
Received:August 09, 2022 Revised:September 15, 2022
Received:August 09, 2022 Revised:September 15, 2022
中文摘要: 在异构Hadoop集群场景中, 为了缓和由于纠删码和副本存储模式混合使用, 以及服务器节点本身实时算力差异造成的MapReduce作业处理效率低下的问题, 本文实现了一种根据数据存储情况和节点实时负载来在多并发场景下动态调节MapReduce作业任务分配情况的调度策略. 该策略通过修改当前Hadoop框架中的数据存储选址策略并对节点任务并发量进行动态控制, 在多作业并发时实现更加均衡的作业间资源分配. 实验结果表明, 相较于Hadoop默认的两种作业调度策略, 本文提出的调度模式能够将作业完成时间缩短约17%, 并有效避免部分作业面临的饥饿现象.
Abstract:In a heterogeneous Hadoop cluster scenario, the hybrid use of erasure codes and replica storage modes, as well as the real-time computing capability difference of server nodes lead to the low efficiency of MapReduce job processing. To deal with this problem, this study implements a scheduling strategy that dynamically adjusts MapReduce job assignment in multi-concurrent scenarios according to data storage situations and the real-time load of nodes. This strategy dynamically controls the concurrent amount of tasks of each node by modifying data storage location strategies in the current Hadoop framework, so as to achieve more balanced resource allocation among jobs when multiple jobs are concurrent. The experimental results show that the scheduling mode proposed in this study can shorten the job completion time by about 17% and effectively avoid the starvation phenomenon faced by some jobs compared with the two default job scheduling strategies of Hadoop.
keywords: MapReduce job scheduling erasure code heterogeneous cluster hybrid storage cloud computing load balance big data
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金重点项目(61832011)
引用文本:
杨振宇,牛天洋,吕敏.混合存储模式下MapReduce作业调度.计算机系统应用,2023,32(3):70-85
YANG Zhen-Yu,NIU Tian-Yang,LYU Min.MapReduce Job Scheduling in Hybrid Storage Modes.COMPUTER SYSTEMS APPLICATIONS,2023,32(3):70-85
杨振宇,牛天洋,吕敏.混合存储模式下MapReduce作业调度.计算机系统应用,2023,32(3):70-85
YANG Zhen-Yu,NIU Tian-Yang,LYU Min.MapReduce Job Scheduling in Hybrid Storage Modes.COMPUTER SYSTEMS APPLICATIONS,2023,32(3):70-85