本文已被:浏览 814次 下载 1503次
Received:June 29, 2020 Revised:July 27, 2020
Received:June 29, 2020 Revised:July 27, 2020
中文摘要: 基于MapReduce的程序被越来越多地应用于大型数据分析的应用中. Apache Hadoop是最常用的开源MapReduce模型之一. 程序运行时间的缩短对于MapReduce程序以及所有数据处理应用而言至关重要, 而能够准确估算MapReduce程序的执行时间是优化程序的重要环节. 本文定义了一个在Hadoop2.x版本中能够准确估算MapReduce作业负载执行时间的性能模型. 该模型包括一个优先级树模型与一个排队网络模型, 分别用于展示一个MapReduce作业中不同任务之间的依赖关系及MapReduce作业内的同步约束. 最后, 实验证明了该模型的可用性.
中文关键词: MapReduce性能模型 Hadoop2.x 队列模型 均值算法
Abstract:MapReduce-based systems are increasingly being used for large-scale data analysis applications. Apache Hadoop is one of the most common open-source implementations of such paradigm. Minimizing the execution time is vital for MapReduce as well as for all data-processing applications, and the accurate estimation of execution time is essential for optimization. In this study, the author created a MapReduce performance model for Hadoop2.x that can precisely estimate the execution time of workload in MapReduce. This model combines a precedence tree model that can capture dependencies between different tasks in one MapReduce job, and a queueing network model that can capture the intra-job synchronization constraints. Such an analytical performance model is a particularly attractive tool as it might provide reasonably accurate job response time at significantly lower cost than the simulation experiment of real data-analysis systems. Furthermore, a clear understanding of systematic job response time under different circumstances is key to making decisions in MapReduce workload management and resource capacity planning.
文章编号: 中图分类号: 文献标志码:
基金项目:
Author Name | Affiliation | |
WU Yue | Forest Industry Planning and Design Institute, National Forestry and Glassland Administration, Beijing 100010, China | wuyue98@126.com |
Author Name | Affiliation | |
WU Yue | Forest Industry Planning and Design Institute, National Forestry and Glassland Administration, Beijing 100010, China | wuyue98@126.com |
引用文本:
吴岳.用于Hadoop2.x的MapReduce性能评估模型.计算机系统应用,2021,30(2):219-225
WU Yue.MapReduce Performance Evaluation Model for Hadoop2.x.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):219-225
吴岳.用于Hadoop2.x的MapReduce性能评估模型.计算机系统应用,2021,30(2):219-225
WU Yue.MapReduce Performance Evaluation Model for Hadoop2.x.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):219-225