应用分布式索引提高海量数据查询性能

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月29日 12:36 星期二

首页 > 过刊浏览>2014年第23卷第6期 >259-261

PDF HTML阅读 XML下载导出引用引用提醒

应用分布式索引提高海量数据查询性能
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        窦晓峰窦晓峰
亚信联创 联通事业部, 北京 100086
在期刊界中查找
在百度中查找
在本站中查找
陈胜陈胜
亚信联创 联通事业部, 北京 100086
在期刊界中查找
在百度中查找
在本站中查找
王熠航王熠航
亚信联创 联通事业部, 北京 100086
在期刊界中查找
在百度中查找
在本站中查找
麦联叨麦联叨
亚信联创 联通事业部, 北京 100086
在期刊界中查找
在百度中查找
在本站中查找
由建宏由建宏
亚信联创 联通事业部, 北京 100086
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Improve Big Data Query Performance by Applying Distributed Indexing

Author:

DOU Xiao-Feng
DOU Xiao-Feng
Department of China Unicom, Asiainfo-Linkage, Beijing 100086, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Sheng
CHEN Sheng
Department of China Unicom, Asiainfo-Linkage, Beijing 100086, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Yi-Hang
WANG Yi-Hang
Department of China Unicom, Asiainfo-Linkage, Beijing 100086, China
在期刊界中查找
在百度中查找
在本站中查找
MAI Lian-Tao
MAI Lian-Tao
Department of China Unicom, Asiainfo-Linkage, Beijing 100086, China
在期刊界中查找
在百度中查找
在本站中查找
YOU Jian-Hong
YOU Jian-Hong
Department of China Unicom, Asiainfo-Linkage, Beijing 100086, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在电信领域的精准化营销、即席查询业务中，存在着大量针对一张宽表或几张宽表（超过50字段）的随机查询场景. 传统处理模式（直接查询数据库）在数据量不大（<；1000万）时，查询响应时间可优化到几秒至数十秒级，而当数据量到达几千万、上亿甚至十亿记录以上时，此处理模式无论如何优化或更改索引机制，都无法满足秒级并发查询要求.新的处理模式通过引入分布式Solr索引层解决上述问题.索引层预先对数据库记录建立索引，查询不再作用于数据库而直接查询索引层，如此，可大幅提高查询性能.经过对两种处理模式的对比验证，在相同环境下，数据量到达5000万，每秒20并发访问的宽表查询场景，传统处理模式的查询全部超时失败，而使用分布式索引层的查询可以在2秒以内返回，查询全部成功.

关键词:精准化营销;即席查询;海量数据;大数据;查询;Solr集群;分布式索引;分片;B-Tree

Abstract:

In the field of telecommunications precision marketing and ad-hoc query, there are a lot of random queries scenarios on one or more wide-tables (which have more than 50 fields). In the traditional system (the queries are performed on the database directly), the query response time can be optimized less than a few seconds to tens of seconds when the database records size is under 10 million. When the data size reaches tens of millions, hundreds of millions or even more than one billion records, whatever optimization including changing indexing mechanism are unable to meet the second-level concurrency query requirements. In the new query system, we introduce the Solr distributed index layer to solve these problems. The layer will index the database records firstly and queries will access the Solr index layer and not perform on the database directly, therefore, the performance will be improved highly. After a comparison of the two processing patterns in same environment, for the data of 50 million, 20 per concurrent access query scenario, the traditional accessing queries all are timeout;while the other's queries can be returned within 2 seconds and all are success.

Key words:precision marketing;ad-hoc query;massive data;big data;query;solr cluster;sharding;B-tree

引用本文

窦晓峰,陈胜,王熠航,麦联叨,由建宏.应用分布式索引提高海量数据查询性能.计算机系统应用,2014,23(6):259-261

复制

文章指标

点击次数:1792
下载次数: 3798
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2013-09-07
最后修改日期:2013-11-27
录用日期:
在线发布日期: 2014-06-20
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码