本文已被:浏览 1714次 下载 2450次
Received:October 19, 2015 Revised:November 25, 2015
Received:October 19, 2015 Revised:November 25, 2015
中文摘要: 分布式数据库HBase在大规模数据加载中较传统关系型数据库有较大的优势但也存在很大的优化空间.基于Hadoop分布式平台搭建HBase环境,并优化自定义数据加载算法.首先,分析HBase底层数据存储,实验得出HBase自带数据加载方式在效率和灵活性方面存在不足;进而,提出了自定义并行数据加载算法,并针对集群进行优化.实验结果表明,优化后的自定义并行数据加载方式能充分发挥集群性能,具有较好的加载效率和数据操作能力.
Abstract:Distributed database HBase has the greater advantage than traditional relational database in large scale data loading but there is also a lot of optimization space. We build HBase environment based on the Hadoop distributed platform, and optimize self-defining data loading algorithm. Firstly, this paper analysis the HBase underlying data store, experiments work out that data loading methods of HBase are insufficient in efficiency and flexibility. Furthermore, it proposes self-defining parallel data loading algorithm, and optimizes the cluster. The experimental results show that the optimized self-defining parallel data loading method can give full play to the cluster performance, has good loading efficiency and data operational capacity.
keywords: HBase Hadoop MapReduce data load performance optimization
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
贺正红,周娅,文缔尧,吴清霞.面向HBase的大规模数据加载研究.计算机系统应用,2016,25(6):231-237
HE Zheng-Hong,ZHOU Ya,WEN Di-Yao,WU Qing-Xia.Research on Large Scale Data Loading Based on HBase.COMPUTER SYSTEMS APPLICATIONS,2016,25(6):231-237
贺正红,周娅,文缔尧,吴清霞.面向HBase的大规模数据加载研究.计算机系统应用,2016,25(6):231-237
HE Zheng-Hong,ZHOU Ya,WEN Di-Yao,WU Qing-Xia.Research on Large Scale Data Loading Based on HBase.COMPUTER SYSTEMS APPLICATIONS,2016,25(6):231-237