基于Spark和Redis的大规模RDF数据查询系统

doi:10.15888/j.cnki.csa.005923

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月17日 21:40 星期四

首页 > 过刊浏览>2017年第26卷第9期 >69-74. DOI:10.15888/j.cnki.csa.005923

PDF HTML阅读 XML下载导出引用引用提醒

基于Spark和Redis的大规模RDF数据查询系统
DOI:
                        10.15888/j.cnki.csa.005923
                    
CSTR:
                        
                    
作者:
                        阳杰阳杰
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
王木涵王木涵
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
徐九韵徐九韵
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Big RDF Graph Query System Based on Spark and Redis

Author:

YANG Jie
YANG Jie
School of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Mu-Han
WANG Mu-Han
School of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
XU Jiu-Yun
XU Jiu-Yun
School of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [17]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着语义Web技术的不断发展，RDF数据量增长迅速，单机RDF查询系统已经难以满足现实需要，研究和构建分布式RDF查询系统已经成为学术界与工业界的研究热点之一.现有的RDF查询系统主要是基于Hadoop或通用分布式技术.前者磁盘I/O太高；后者则可扩展性较差.且两种系统在基本图模式查询时，效率都较低.针对上述问题，本文设计了基于Spark和Redis的分布式系统架构，并改进了查询计划生成算法，最后实现了原型系统RDF-SR.该系统使用Spark减少了磁盘I/O，借助Redis提高了数据映射速率，利用改进的算法减少了数据混洗次数.实验表明，相比于现有的其他系统，RDF-SR既保持了较高可扩展性，又在基本图模式查询时，具有更高的性能.

关键词:语义Web;大规模RDF;Spark;Redis

Abstract:

With the development of semantic web technology, RDF data grow rapidly. The single node RDF query system cannot meet the practical needs. Building distributed RDF query system has become one of the hotspots in the academia and industry. The existing RDF query system is based on Hadoop and general distributed technology. The disk I/O of the former is too high and the latter is less scalable. Besides, the two systems perform poorly in the basic pattern matching mode. In order to solve these problems, we design a distributed system architecture based on Spark and Redis, and improve the query plan generation algorithm. We call the prototype system RDF-SR. This system reduces the disk I/O by Spark, improves the data mapping rate by Redis and reduces the data shuffling process with improved algorithms. Our evaluation shows that RDF-SR performs better in the basic pattern matching mode compared with other systems.

Key words:semantic Web;big RDF graph;Spark;Redis

参考文献

[1] http://lod-cloud.net/versions/2011-09-19/lod-cloud.html.

[2] 宋纪成. 海量RDF数据存储与查询技术的研究与实现[硕士学位论文]. 北京:北京工业大学, 2013.

[3] Cai M, Frank M. RDFPeers:A scalable distributed RDF repository based on a structured peer-to-peer network. Proc. of the 13th International Conference on World Wide Web. New York, NY, USA. 2004. 650-657.

[4] Harth A, Umbrich J, Hogan A, et al. YARS2:A federated repository for querying graph structured data from the web. The Semantic Web. Lecture Notes in Computer Science. Berlin Heidelberg. 2007. 211-224.

[5] Huang JW, Abadi DJ, Ren K. Scalable SPARQL querying of large RDF graphs. Proc. of the VLDB Endowment, 2011, 4(11):1123-1134.

[6] Schätzle A, Przyjaciel-Zablocki M, Hornung T, et al. PigSPARQL:A SPARQL query processing baseline for big data. Proc. of the 2013 International Conference on Posters & Demonstrations Track. Sydney, Australia. 2013. 241-244.

[7] Schätzle A, Przyjaciel-Zablocki M, Neu A, et al. Sempala:Interactive SPARQL query processing on hadoop. The Semantic Web-ISWC 2014. Cham. 2014. 164-179.

[8] Du JH, Wang HF, Ni Y, et al. HadoopRDF:A scalable semantic data analytical engine. Intelligent Computing Theories and Applications. Berlin Heidelberg. 2012. 633-641.

[9] https://spark.apache.org/docs/latest/index.html.

[10] Carlson JL. Redis in Action. Greenwich, CT, USA:Manning Publications, 2013.

[11] Apache Spark. Spark programming guide. https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables.

[12] 杜方, 陈跃国, 杜小勇. RDF数据查询处理技术综述. 软件学报, 2013, 24(6):1222-1242.

[13] Prud'hommeaux E, Seaborne A. SPARQL query language for RDF. W3C Recommendation, 2008:15.

[14] ARQ-A SPARQL processor for jena. http://jena.apache.org/documentation/query/index.html.

[15] https://www.qingcloud.com.

[16] LUBMft -the RDF fulltext benchmark. http://www.l3s.de/~minack/rdf-fulltext-benchmark.

[17] Stocker M, Seaborne A, Bernstein A, et al. SPARQL basic graph pattern optimization using selectivity estimation. Proc. of the 17th International Conference on World Wide Web. Beijing, China. 2008. 595-604.

引用本文

阳杰,王木涵,徐九韵.基于Spark和Redis的大规模RDF数据查询系统.计算机系统应用,2017,26(9):69-74

复制

文章指标

点击次数:1460
下载次数: 3032
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2016-12-13
最后修改日期:
录用日期:
在线发布日期: 2017-10-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码