MapReduce并行计算技术发展综述
作者:
基金项目:

江苏省卓越工程师(软件类)计划试点专业(苏教高函[2012]17号)


Survey of Developments of MapReduce Parallel Computing Technology
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    经过几年的发展,并行编程模型MapReduce产生了若干个改进框架,它们都是针对传统MapReduce的不足进行的修正或重写. 本文阐述和分析了这些研究成果,包括: 以HaLoop为代表的迭代计算框架、以Twitter Storm为代表的实时计算框架、以Apache Hama为代表的图计算框架以及以Apache YARN为代表的框架管理平台. 这些专用系统在大数据领域发挥着越来越重要的作用.

    Abstract:

    With the rapid development of recent years, some improved framework of MapReduce parallel programming model appeared. They are correction and recoding against lack of MRv1. This paper describes and analyzes this research achievements, including iterative computing framework as represented by HaLoop, real-time computing framework as represented by Twitter Storm, graph computing framework as represented by Apache Hama, computing resources negotiation platform as represented by Apache YARN. These special systems play a vital role in BigData fields.

    参考文献
    1 陈康,郑纬民.云计算:系统实例与研究现状.软件学报,2009,20(5):1337-1348.
    2 Ghemawat S, Gobioff H, Leung ST. The Google file system. ACM SIGOPS Operating Systems Review. ACM. 2003, 37(5): 29-43.
    3 Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.
    4 Chang F, Dean J, Ghemawat S, et a1. Bigtable: A distributed storage system for structured data. Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. 2006. 205-218.
    5 Tom White著.周敏奇,王晓玲,金澈清等译.Hadoop权威指南(第二版).北京:清华大学出版社,2011.
    6 周洪波.云计算:技术、应用、标准和商业模式.北京:电子工业出版社,2011.
    7 Shvachko K, Kuang H, Radia S, et al. The Hadoop distributed file system. Mass Storage Systems and Technologies (MSST). 2010 IEEE 26th Symposium on. IEEE. 2010. 1-10.
    8 李国杰.大数据研究的科学价值.中国计算机学会通讯,2012,8(9):8-15.
    9 IDC发布最新《数字宇宙研究报告》. http://www.ecas.cn/xxkw/kbcd/201115_93655/ml/xxhjsyjcss/201212/t20121229_3730152.html.
    10 李建江,崔健,王聃,等.MapReduce并行编程模型研究综述.电子学报,2011,(11):2635-2642.
    11 幸莉仙,黄慧连. MapReduce框架下的朴素贝叶斯算法并行化研究.计算机系统应用,2013,22(2):108-111.
    12 Bu Y, Howe B, Balazinska M, et al. HaLoop: Efficient iterative data processing on large clusters. Proc. of the VLDB Endowment, 2010, 3(1-2): 285-296.
    13 Ekanayake J, Li H, Zhang B, et al. Twister: A runtime for iterative mapreduce. Proc. of the 19th ACM International Symposium on High Performance Distributed Computing. ACM. 2010. 810-818.
    14 Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association. 2012. 2-2.
    15 Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. Proc. of the 2nd USENIX Conference on Hot topics in Cloud Computing. 2010. 10-10.
    16 于戈,谷峪,鲍玉斌,等.云计算环境下的大规模图数据处理技术.计算机学报,2011,(10):1753-1767.
    17 Melnik S, Gubarev A, Long JJ, et al. Dremel: interactive analysis of web-scale datasets. Proc. of the VLDB Endowment, 2010, 3(1-2): 330-339.
    18 Drill Home Page. http://incubator.apache.org/drill/.
    19 Storm Home Page. http://storm-project.net/.
    20 Stinger Initiative Home Page. http://hortonworks.com/stinger/.
    21 Neumeyer L, Robbins B, Nair A, et al. S4: Distributed stream computing platform. Data Mining Workshops (ICDMW). 2010 IEEE International Conference on. IEEE. 2010. 170-177.
    22 Cloudera Impala Home Page. https://github.com/cloudera/impala.
    23 Samza Home Page. http://samza.incubator.apache.org/.
    24 Valiant LG. A bridging model for parallel computation. communications of the ACM, 1990, 33(8): 103-111.
    25 Pace MF. BSP vs MapReduce. Procedia Computer Science, 2012, 9: 246-255.
    26 Malewicz G, Austern MH, Bik AJC, et al. Pregel: a system for large-scale graph processing. Proc. of the 2010 international conference on Management of data. ACM. 2010. 135-146.
    27 Giraph Home Page. http://giraph.apache.org/.
    28 Hama Home Page. http://hama.apache.org/.
    29 Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 1998, 30(1): 107-117.
    30 Vavilapalli VK, Murthy AC, Douglas C, et al. Apache hadoop YARN: Yet another resource negotiator. Proc. of the Fourth ACM Symposium on Cloud Computing. ACM, 2013.
    31 Hindman B, Konwinski A, Zaharia M, et al. Mesos: A plat- form for fine-grained resource sharing in the Data Center. Proc. of the 8th USENIX Conference on Networked Systems Design and Implementation. USENIX Association. 2011. 22-22.
    32 Corona Home Page. https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona/.
引用本文

应毅,刘亚军. MapReduce并行计算技术发展综述.计算机系统应用,2014,23(4):1-6,11

复制
分享
文章指标
  • 点击次数:3848
  • 下载次数: 8666
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2013-09-06
  • 最后修改日期:2013-10-12
  • 在线发布日期: 2014-04-25
文章二维码
您是第12472747位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号