本文已被:浏览 675次 下载 2208次
Received:December 04, 2021 Revised:January 04, 2022
Received:December 04, 2021 Revised:January 04, 2022
中文摘要: 人工智能技术的快速发展和在云原生上部署应用高效等优点让越来越多的开发者和互联网企业将人工智能应用部署在Kubernetes集群上, 但Kubernetes并不是主要针对深度学习而设计, 对深度学习这个特定领域需要做定制优化. 本文针对具有一定规模的Kubernetes集群上部署深度学习负载的场景, 设计和实现了一系列优化方案, 主要从深度学习所要求的数据处理、graphics processing unit (GPU)计算、分布式训练等几个方面进行优化, 本文提出的优化方案覆盖了数据处理、计算等方面, 这些技术极大简化人工智能负载在规模化云原生平台上的部署难度和提高运行效率, 同时从实践上来看也验证了以上技术对人工智能应用有着显著的提升作用.
Abstract:Owing to the rapid development of artificial intelligence (AI) technologies and the efficient deployment of AI applications on cloud-native platforms, an increasing number of developers and internet companies deploy AI applications on Kubernetes clusters. However, Kubernetes is not designed chiefly for deep learning, which, as a special field, requires customized optimization. This study designs and implements a series of optimization schemes, mainly from the perspectives of data processing, graphics processing unit (GPU) calculation, and distributed training that deep learning requires, for the scenario of deploying deep learning workloads on Kubernetes clusters of a certain scale. The proposed optimization schemes involve data processing and calculation. These technologies reduce the difficulty in deploying AI workloads on large-scale cloud-native platforms and improve operational efficiency greatly. Moreover, the practice also verifies their significant improvement effect on AI applications.
keywords: Kubernetes deep learning distributed training CUDA load optimization artificial intelligence
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
陈培,王超,段国栋,王德奎,王斌,王文潇,孙辽东,荆荣讯,邢良占,刘慧兴,姬贵阳.Kubernetes集群上深度学习负载优化.计算机系统应用,2022,31(9):114-126
CHEN Pei,WANG Chao,DUAN Guo-Dong,WANG De-Kui,WANG Bin,WANG Wen-Xiao,SUN Liao-Dong,JING Rong-Xun,XING Liang-Zhan,LIU Hui-Xing,JI Gui-Yang.Optimization of Deep Learning Workload on Kubernetes Cluster.COMPUTER SYSTEMS APPLICATIONS,2022,31(9):114-126
陈培,王超,段国栋,王德奎,王斌,王文潇,孙辽东,荆荣讯,邢良占,刘慧兴,姬贵阳.Kubernetes集群上深度学习负载优化.计算机系统应用,2022,31(9):114-126
CHEN Pei,WANG Chao,DUAN Guo-Dong,WANG De-Kui,WANG Bin,WANG Wen-Xiao,SUN Liao-Dong,JING Rong-Xun,XING Liang-Zhan,LIU Hui-Xing,JI Gui-Yang.Optimization of Deep Learning Workload on Kubernetes Cluster.COMPUTER SYSTEMS APPLICATIONS,2022,31(9):114-126