Optimization of Deep Learning Workload on Kubernetes Cluster
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Owing to the rapid development of artificial intelligence (AI) technologies and the efficient deployment of AI applications on cloud-native platforms, an increasing number of developers and internet companies deploy AI applications on Kubernetes clusters. However, Kubernetes is not designed chiefly for deep learning, which, as a special field, requires customized optimization. This study designs and implements a series of optimization schemes, mainly from the perspectives of data processing, graphics processing unit (GPU) calculation, and distributed training that deep learning requires, for the scenario of deploying deep learning workloads on Kubernetes clusters of a certain scale. The proposed optimization schemes involve data processing and calculation. These technologies reduce the difficulty in deploying AI workloads on large-scale cloud-native platforms and improve operational efficiency greatly. Moreover, the practice also verifies their significant improvement effect on AI applications.

    Reference
    Related
    Cited by
Get Citation

陈培,王超,段国栋,王德奎,王斌,王文潇,孙辽东,荆荣讯,邢良占,刘慧兴,姬贵阳. Kubernetes集群上深度学习负载优化.计算机系统应用,2022,31(9):114-126

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 04,2021
  • Revised:January 04,2022
  • Adopted:
  • Online: June 16,2022
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063