High Availability Dual Engine Data Warehouse Based on Hive
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [13]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Breaking isolated information island, integrating heterogeneous data, gathering and sharing exchanges, conducting in-depth analysis and mining, and providing industry-side decision-making and situation analysis have far-reaching theoretical and applied value. Based on the actual demand of the situational awareness service of the Chinese Academy of Sciences, this study designs and implements a Hive-based Hadoop/Spark dual computing engine big data warehouse supporting OLAP analysis in multiple ways, and carries out an optimization design of usability, load balancing, and resource management, which provides platform support for the subsequent data aggregation and mining, knowledge map construction and discipline situation analysis. Experimental results show that the system is flexible, efficient, available, and scalable, the resource scheduling is scientific, and the load balancing effect is obvious.

    Reference
    [1] 许燕,曾建勋.面向科研管理的机构知识库建设政策与机制.图书情报工作, 2015, 59(6):22-27
    [2] 董成立.谈高校科研管理及其信息管理系统.科技管理研究, 2009, 29(5):274-276.[doi:10.3969/j.issn.1000-7695.2009.05.092
    [3] Inmon WH. Building the data warehouse. 3rd ed. New York:Wiley, 2002.
    [4] 于娟.数据仓库与大数据融合的探讨.电信科学, 2015, 31(3):160-164
    [5] 吴真.基于Hadoop平台构建数据仓库关键技术的研究[硕士学位论文].北京:华北电力大学, 2017.
    [6] Floratou A, Minhas UF, Özcan F. SQL-on-Hadoop:Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment, 2014, 7(12):1295-1306.[doi:10.14778/2732977
    [7] 何明,常盟盟,刘郭洋,等.基于SQL-on-Hadoop查询引擎的日志挖掘及其应用.智能系统学报, 2017, 12(5):717-728
    [8] 吴黎兵,邱鑫,叶璐瑶,等.基于Hadoop的SQL查询引擎性能研究.华中师范大学学报(自然科学版), 2016, 50(2):174-182.[doi:10.3969/j.issn.1000-1190.2016.02.003
    [9] Capriolo E, Wampler D, Rutherglen J. Hive编程指南.曹坤,译.北京:人民邮电出版社, 2013.
    [10] 吕艳峰. Hadoop分布式文件系统存储机制的研究与优化[硕士学位论文].西安:西北大学, 2018.
    [11] LINBIT|DRBD HA, Disaster Recovery, Software-Defined Storage. LINBIT HA|LINBIT-High Availability with LINBIT HA. https://www.linbit.com/en/high-availability/, 2019.
    [12] https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark% 3A+Getting+Started.
    [13] Cbonte. github. io. HAProxy version 1.5. 19-Configuration Manual. https://cbonte.github.io/haproxy-dconv/1.5/configuration.html#balance.[2016-12-25, 2019-03-15] .
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李翀,张彤彤,杜伟静,刘学敏.基于Hive的高可用双引擎数据仓库.计算机系统应用,2019,28(9):65-71

Copy
Share
Article Metrics
  • Abstract:2175
  • PDF: 2743
  • HTML: 3293
  • Cited by: 0
History
  • Received:February 28,2019
  • Revised:March 14,2019
  • Online: September 09,2019
  • Published: September 15,2019
Article QR Code
You are the first990804Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063