基于Kubernetes的AI调度引擎平台

doi:10.15888/j.cnki.csa.009182

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月1日 5:43 星期二

首页 > 过刊浏览>2023年第32卷第8期 >86-94. DOI:10.15888/j.cnki.csa.009182

PDF HTML阅读 XML下载导出引用引用提醒

基于Kubernetes的AI调度引擎平台
DOI:
                        10.15888/j.cnki.csa.009182
                    
CSTR:
                        
                    
作者:
                        刘祥刘祥
西安电子科技大学 广州研究院, 广州 510555
在期刊界中查找
在百度中查找
在本站中查找
胡瑞敏胡瑞敏
西安电子科技大学 杭州研究院, 杭州 311231
在期刊界中查找
在百度中查找
在本站中查找
王海滨王海滨
厦门市美亚柏科信息股份有限公司, 厦门 361008
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

AI Scheduling Engine Platform Based on Kubernetes

Author:

LIU Xiang
LIU Xiang
Guangzhou Institution of Technology, Xidian University, Guangzhou 510555, China
在期刊界中查找
在百度中查找
在本站中查找
HU Rui-Min
HU Rui-Min
Hangzhou Institution of Technology, Xidian University, Hangzhou 311231, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Hai-Bin
WANG Hai-Bin
Xiamen Meiya Baike Information Co. Ltd., Xiamen 361008, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

文中介绍了基于Kubernetes的AI调度引擎平台的设计与实现, 针对当前人工智能调度系统中存在的服务配置复杂, 集群中各节点计算资源利用率不均衡以及系统运维成本高等问题, 本文提出了基于Kubernetes实现容器调度和服务管理的解决方案. 结合AI调度引擎平台的需求, 从功能实现和平台架构等方面设计该平台的各个模块. 同时, 针对Kubernetes无法感知GPU资源的问题, 引入device plugin收集集群中每个节点上的GPU信息并上报给调度器. 此外, 针对Kubernetes调度策略中优选算法只考虑节点本身的资源使用率和均衡度, 未考虑不同类型的应用对节点资源的需求差异, 提出了基于皮尔逊相关系数 (Pearson correlation coefficient, PCC)的优选算法, 通过计算容器资源需求量与节点资源使用率的互补度来决定Pod的调度, 从而保证调度完成后各节点的资源均衡性.

关键词:Kubernetes|容器|调度|皮尔逊相关系数

Abstract:

The design and realization of the AI scheduling engine platform based on Kubernetes is introduced in this paper. To tackle the problems of complex service configuration, the unbalanced utilization rate of computing resources of each node in the cluster and the high cost of system operation and maintenance in the current AI scheduling system, this study proposes a solution based on Kubernetes to implement container scheduling and service management. Combined with the requirements of the AI scheduling engine platform, the various modules of the platform are designed from such aspects as function implementation and platform architecture. At the same time, given the problem that Kubernetes cannot perceive GPU resources, Device Plugin is introduced to collect GPU information on each node in the cluster and report it to the scheduler. In addition, as priority algorithms in Kubernetes scheduling strategy only considers the resource utilization rate and balance degree of the node itself, disregarding the differences in the demand of different types of applications for node resources, priority algorithms based on Pearson correlation coefficient (PCC) is put forward. The scheduling of Pod is determined by calculating the complementary degree of container resources demand and node resource utilization rate, thus ensuring the resource balance of each node after the scheduling.

Key words:Kubernetes|container|schedule|Pearson correlation coefficient (PCC)

引用本文

刘祥,胡瑞敏,王海滨.基于Kubernetes的AI调度引擎平台.计算机系统应用,2023,32(8):86-94

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-01-09
最后修改日期:2023-02-09
录用日期:
在线发布日期: 2023-05-22
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码