###
DOI:
计算机系统应用英文版:2012,21(11):179-182
本文二维码信息
码上扫一扫!
基于MapFile 的HDFS 小文件存储效率问题
(福州大学 数学与计算机科学学院, 福州 350108)
Efficiency of Storaging Small Files in HDFS Based on MapFile
(School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1735次   下载 4362
Received:March 28, 2012    Revised:May 01, 2012
中文摘要: 针对HDFS 最初是为流式访问大文件而开发的, 而对于大量小文件的存储效率不高问题, 采用MapFile设计一个HDFS 中存储小文件的方案. 该方案的主要思想是在上传HDFS 时增加一个文件类型判断模块, 建立一个小文件队列, 将小文件序列化存入一个MapFile 容器,合并成大文件, 并建立相应的索引文件, 有效降低文件数目和提高访问效率. 通过和现有的Hadoop Archives(HAR files)文件归档解决小文件问题的方案对比, 实验结果表明, 基于MapFile 的存储小文件方案可以更为有
中文关键词: HDFS  小文件  MapFile  SequenceFile  云存储
Abstract:The Hadoop distributes file system(HDFS) which can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small file. An approach based on MapFile is proposed to improve storage efficiency of small files in HDFS.The main idea is to add a file type judgment module while uploading a file, and create a small file queue, put the small file serialization in a MapFile container.and establishes the index file. Experimental results show that, the storage efficiency of small files is improved contrast to Hadoop Archives(HAR files).
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
洪旭升,林世平.基于MapFile 的HDFS 小文件存储效率问题.计算机系统应用,2012,21(11):179-182
HONG Xu-Sheng,LIN Shi-Ping.Efficiency of Storaging Small Files in HDFS Based on MapFile.COMPUTER SYSTEMS APPLICATIONS,2012,21(11):179-182