Efficiency of Storaging Small Files in HDFS Based on MapFile
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The Hadoop distributes file system(HDFS) which can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small file. An approach based on MapFile is proposed to improve storage efficiency of small files in HDFS.The main idea is to add a file type judgment module while uploading a file, and create a small file queue, put the small file serialization in a MapFile container.and establishes the index file. Experimental results show that, the storage efficiency of small files is improved contrast to Hadoop Archives(HAR files).

    Reference
    Related
    Cited by
Get Citation

洪旭升,林世平.基于MapFile 的HDFS 小文件存储效率问题.计算机系统应用,2012,21(11):179-182

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 28,2012
  • Revised:May 01,2012
  • Adopted:
  • Online:
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063