本文已被:浏览 1386次 下载 3823次
Received:March 08, 2015 Revised:May 18, 2015
Received:March 08, 2015 Revised:May 18, 2015
中文摘要: 针对Hadoop中提供底层存储的HDFS对处理海量小文件效率低下、严重影响性能的问题.设计了一种小文件合并、索引和提取方案,并与原始的HDFS以及HAR文件归档方案进行对比,通过一系列实验表明,本文的方案能有效减少Namenode内存占用,提高HDFS的I/O性能.
中文关键词: Hadoop HDFS 小文件 HDFS的I/O性能
Abstract:HDFS provides the underlying storage for Hadoop, however, the HDFS deals with massive small files inefficiently and decreases system performance seriously. To solve this problem, we designed a file merging, indexing and retrieval solution. Then through a series of experiments compared to the original HDFS and HAR solution, it can be shown that our scheme can effectively reduce the memory usage of Namenode and improve the I/O performance of HDFS.
keywords: Hadoop HDFS small files I/O performance of HDFS
文章编号: 中图分类号: 文献标志码:
基金项目:2013年度科技部科技支撑计划(2013BAJ10B14-5)
引用文本:
李旭,李长云,张清清,胡淑新,周玲芳.Hadoop中处理海量小文件的方法.计算机系统应用,2015,24(11):157-161
LI Xu,LI Chang-Yun,ZHANG Qing-Qing,HU Shu-Xin,ZHOU Ling-Fang.Methods of Dealing With Massive Small Files in Hadoop.COMPUTER SYSTEMS APPLICATIONS,2015,24(11):157-161
李旭,李长云,张清清,胡淑新,周玲芳.Hadoop中处理海量小文件的方法.计算机系统应用,2015,24(11):157-161
LI Xu,LI Chang-Yun,ZHANG Qing-Qing,HU Shu-Xin,ZHOU Ling-Fang.Methods of Dealing With Massive Small Files in Hadoop.COMPUTER SYSTEMS APPLICATIONS,2015,24(11):157-161