###
计算机系统应用英文版:2021,30(2):219-225
本文二维码信息
码上扫一扫!
用于Hadoop2.x的MapReduce性能评估模型
(国家林业和草原局 林产工业规划设计院, 北京 100010)
MapReduce Performance Evaluation Model for Hadoop2.x
(Forest Industry Planning and Design Institute, National Forestry and Glassland Administration, Beijing 100010, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 814次   下载 1503
Received:June 29, 2020    Revised:July 27, 2020
中文摘要: 基于MapReduce的程序被越来越多地应用于大型数据分析的应用中. Apache Hadoop是最常用的开源MapReduce模型之一. 程序运行时间的缩短对于MapReduce程序以及所有数据处理应用而言至关重要, 而能够准确估算MapReduce程序的执行时间是优化程序的重要环节. 本文定义了一个在Hadoop2.x版本中能够准确估算MapReduce作业负载执行时间的性能模型. 该模型包括一个优先级树模型与一个排队网络模型, 分别用于展示一个MapReduce作业中不同任务之间的依赖关系及MapReduce作业内的同步约束. 最后, 实验证明了该模型的可用性.
Abstract:MapReduce-based systems are increasingly being used for large-scale data analysis applications. Apache Hadoop is one of the most common open-source implementations of such paradigm. Minimizing the execution time is vital for MapReduce as well as for all data-processing applications, and the accurate estimation of execution time is essential for optimization. In this study, the author created a MapReduce performance model for Hadoop2.x that can precisely estimate the execution time of workload in MapReduce. This model combines a precedence tree model that can capture dependencies between different tasks in one MapReduce job, and a queueing network model that can capture the intra-job synchronization constraints. Such an analytical performance model is a particularly attractive tool as it might provide reasonably accurate job response time at significantly lower cost than the simulation experiment of real data-analysis systems. Furthermore, a clear understanding of systematic job response time under different circumstances is key to making decisions in MapReduce workload management and resource capacity planning.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
吴岳.用于Hadoop2.x的MapReduce性能评估模型.计算机系统应用,2021,30(2):219-225
WU Yue.MapReduce Performance Evaluation Model for Hadoop2.x.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):219-225