Kubernetes集群上深度学习负载优化

doi:10.15888/j.cnki.csa.008672

微信公众号

网站二维码

首页 > 过刊浏览>2022年第31卷第9期 >114-126. DOI:10.15888/j.cnki.csa.008672

PDF HTML阅读 XML下载导出引用引用提醒

Kubernetes集群上深度学习负载优化
DOI:
                        10.15888/j.cnki.csa.008672
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Optimization of Deep Learning Workload on Kubernetes Cluster

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

人工智能技术的快速发展和在云原生上部署应用高效等优点让越来越多的开发者和互联网企业将人工智能应用部署在Kubernetes集群上, 但Kubernetes并不是主要针对深度学习而设计, 对深度学习这个特定领域需要做定制优化. 本文针对具有一定规模的Kubernetes集群上部署深度学习负载的场景, 设计和实现了一系列优化方案, 主要从深度学习所要求的数据处理、graphics processing unit (GPU)计算、分布式训练等几个方面进行优化, 本文提出的优化方案覆盖了数据处理、计算等方面, 这些技术极大简化人工智能负载在规模化云原生平台上的部署难度和提高运行效率, 同时从实践上来看也验证了以上技术对人工智能应用有着显著的提升作用.

Abstract:

Owing to the rapid development of artificial intelligence (AI) technologies and the efficient deployment of AI applications on cloud-native platforms, an increasing number of developers and internet companies deploy AI applications on Kubernetes clusters. However, Kubernetes is not designed chiefly for deep learning, which, as a special field, requires customized optimization. This study designs and implements a series of optimization schemes, mainly from the perspectives of data processing, graphics processing unit (GPU) calculation, and distributed training that deep learning requires, for the scenario of deploying deep learning workloads on Kubernetes clusters of a certain scale. The proposed optimization schemes involve data processing and calculation. These technologies reduce the difficulty in deploying AI workloads on large-scale cloud-native platforms and improve operational efficiency greatly. Moreover, the practice also verifies their significant improvement effect on AI applications.

参考文献

相似文献

引证文献

引用本文

陈培,王超,段国栋,王德奎,王斌,王文潇,孙辽东,荆荣讯,邢良占,刘慧兴,姬贵阳. Kubernetes集群上深度学习负载优化.计算机系统应用,2022,31(9):114-126

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-12-04
最后修改日期:2022-01-04
录用日期:
在线发布日期: 2022-06-16
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码