基于MapFile 的HDFS 小文件存储效率问题

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2012年第21卷第11期 >179-182

基于MapFile 的HDFS 小文件存储效率问题
DOI:
                        
CSTR:
                        
作者:
                        
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Efficiency of Storaging Small Files in HDFS Based on MapFile

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对HDFS 最初是为流式访问大文件而开发的, 而对于大量小文件的存储效率不高问题, 采用MapFile设计一个HDFS 中存储小文件的方案. 该方案的主要思想是在上传HDFS 时增加一个文件类型判断模块, 建立一个小文件队列, 将小文件序列化存入一个MapFile 容器,合并成大文件, 并建立相应的索引文件, 有效降低文件数目和提高访问效率. 通过和现有的Hadoop Archives(HAR files)文件归档解决小文件问题的方案对比, 实验结果表明, 基于MapFile 的存储小文件方案可以更为有

Abstract:

The Hadoop distributes file system(HDFS) which can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small file. An approach based on MapFile is proposed to improve storage efficiency of small files in HDFS.The main idea is to add a file type judgment module while uploading a file, and create a small file queue, put the small file serialization in a MapFile container.and establishes the index file. Experimental results show that, the storage efficiency of small files is improved contrast to Hadoop Archives(HAR files).

参考文献

相似文献

引证文献

引用本文

洪旭升,林世平.基于MapFile 的HDFS 小文件存储效率问题.计算机系统应用,2012,21(11):179-182

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2012-03-28
最后修改日期:2012-05-01
录用日期:
在线发布日期:
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码