基于Spark和Redis的大规模RDF数据查询系统
作者:

Big RDF Graph Query System Based on Spark and Redis
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [17]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着语义Web技术的不断发展,RDF数据量增长迅速,单机RDF查询系统已经难以满足现实需要,研究和构建分布式RDF查询系统已经成为学术界与工业界的研究热点之一.现有的RDF查询系统主要是基于Hadoop或通用分布式技术.前者磁盘I/O太高;后者则可扩展性较差.且两种系统在基本图模式查询时,效率都较低.针对上述问题,本文设计了基于Spark和Redis的分布式系统架构,并改进了查询计划生成算法,最后实现了原型系统RDF-SR.该系统使用Spark减少了磁盘I/O,借助Redis提高了数据映射速率,利用改进的算法减少了数据混洗次数.实验表明,相比于现有的其他系统,RDF-SR既保持了较高可扩展性,又在基本图模式查询时,具有更高的性能.

    Abstract:

    With the development of semantic web technology, RDF data grow rapidly. The single node RDF query system cannot meet the practical needs. Building distributed RDF query system has become one of the hotspots in the academia and industry. The existing RDF query system is based on Hadoop and general distributed technology. The disk I/O of the former is too high and the latter is less scalable. Besides, the two systems perform poorly in the basic pattern matching mode. In order to solve these problems, we design a distributed system architecture based on Spark and Redis, and improve the query plan generation algorithm. We call the prototype system RDF-SR. This system reduces the disk I/O by Spark, improves the data mapping rate by Redis and reduces the data shuffling process with improved algorithms. Our evaluation shows that RDF-SR performs better in the basic pattern matching mode compared with other systems.

    参考文献
    [1] http://lod-cloud.net/versions/2011-09-19/lod-cloud.html.
    [2] 宋纪成. 海量RDF数据存储与查询技术的研究与实现[硕士学位论文]. 北京:北京工业大学, 2013.
    [3] Cai M, Frank M. RDFPeers:A scalable distributed RDF repository based on a structured peer-to-peer network. Proc. of the 13th International Conference on World Wide Web. New York, NY, USA. 2004. 650-657.
    [4] Harth A, Umbrich J, Hogan A, et al. YARS2:A federated repository for querying graph structured data from the web. The Semantic Web. Lecture Notes in Computer Science. Berlin Heidelberg. 2007. 211-224.
    [5] Huang JW, Abadi DJ, Ren K. Scalable SPARQL querying of large RDF graphs. Proc. of the VLDB Endowment, 2011, 4(11):1123-1134.
    [6] Schätzle A, Przyjaciel-Zablocki M, Hornung T, et al. PigSPARQL:A SPARQL query processing baseline for big data. Proc. of the 2013 International Conference on Posters & Demonstrations Track. Sydney, Australia. 2013. 241-244.
    [7] Schätzle A, Przyjaciel-Zablocki M, Neu A, et al. Sempala:Interactive SPARQL query processing on hadoop. The Semantic Web-ISWC 2014. Cham. 2014. 164-179.
    [8] Du JH, Wang HF, Ni Y, et al. HadoopRDF:A scalable semantic data analytical engine. Intelligent Computing Theories and Applications. Berlin Heidelberg. 2012. 633-641.
    [9] https://spark.apache.org/docs/latest/index.html.
    [10] Carlson JL. Redis in Action. Greenwich, CT, USA:Manning Publications, 2013.
    [11] Apache Spark. Spark programming guide. https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables.
    [12] 杜方, 陈跃国, 杜小勇. RDF数据查询处理技术综述. 软件学报, 2013, 24(6):1222-1242.
    [13] Prud'hommeaux E, Seaborne A. SPARQL query language for RDF. W3C Recommendation, 2008:15.
    [14] ARQ-A SPARQL processor for jena. http://jena.apache.org/documentation/query/index.html.
    [15] https://www.qingcloud.com.
    [16] LUBMft -the RDF fulltext benchmark. http://www.l3s.de/~minack/rdf-fulltext-benchmark.
    [17] Stocker M, Seaborne A, Bernstein A, et al. SPARQL basic graph pattern optimization using selectivity estimation. Proc. of the 17th International Conference on World Wide Web. Beijing, China. 2008. 595-604.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

阳杰,王木涵,徐九韵.基于Spark和Redis的大规模RDF数据查询系统.计算机系统应用,2017,26(9):69-74

复制
分享
文章指标
  • 点击次数:1460
  • 下载次数: 3032
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2016-12-13
  • 在线发布日期: 2017-10-31
文章二维码
您是第11370487位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号