广播机制解决Shuffle过程数据倾斜的方法

doi:10.15888/j.cnki.csa.006985

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月7日 21:26 星期一

首页 > 过刊浏览>2019年第28卷第6期 >189-197. DOI:10.15888/j.cnki.csa.006985

PDF HTML阅读 XML下载导出引用引用提醒

广播机制解决Shuffle过程数据倾斜的方法
DOI:
                        10.15888/j.cnki.csa.006985
                    
CSTR:
                        
                    
作者:
                        吴恩慈吴恩慈
上海淇毓信息科技有限公司, 上海 200120
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Method Research to Solve Shuffle Data Skew Based on Broadcast

Author:

WU En-Ci
WU En-Ci
Shanghai Qiyu Information Technology Co. Ltd., Shanghai 200120, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在Spark计算平台中，数据倾斜往往导致某些节点承受更大的网络流量和计算压力，给集群的CPU、内存、磁盘和流量带来了巨大的负担，影响整个集群的计算性能.本文通过对Spark Shuffle设计和算法实现的研究，深入分析在大规模分布式环境下发生数据倾斜的本质原因.提出了广播机制避免Shuffle过程数据倾斜的方法，分析了广播变量分发逻辑过程，给出广播变量性能优势分析和该方法的算法实现.通过Broadcast Join实验验证了该方法在性能上有稳定的提升.

关键词:数据倾斜;分区策略;洗牌算法;广播机制

Abstract:

In the Spark computing platform, data skew often causes some nodes to withstand greater network traffic and computing pressure, which imposes a huge burden on the cluster's CPU, memory, disk, and traffic, affecting the computing performance of the entire cluster. Through the research on Spark Shuffle design and algorithm implementation, and deep analyses on the essential reasons of data skew in large-scale distributed environment, this study proposes a method to avoid data skew in shuffle process through the broadcast mechanism, analyzes the process of broadcast variable distribution logic, and gives the algorithm implementation and performance advantage analysis of the method. The performance of the method is improved by the Broadcast Join experiment.

Key words:data skew;partition;shuffle;broadcast

引用本文

吴恩慈.广播机制解决Shuffle过程数据倾斜的方法.计算机系统应用,2019,28(6):189-197

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-12-19
最后修改日期:2019-01-15
录用日期:
在线发布日期: 2019-05-28
出版日期: 2019-06-15

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码