Distributed Task Execution System for Deep Learning
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The whole lifecycle hosting platform of deep learning offers a web solution to experimental tasks and boosts the application of deep learning technology in production and life. To address the problem of training image recognition models by the platform, this study designs and implements a distributed task execution system for experimental tasks. The system is composed of modules for resource monitoring, task scheduling, task execution, and log management. It schedules tasks according to indicators, such as resource utilization, executes tasks in Docker containers and collects generated log data in real time. The test results demonstrate that the system fulfils the normal functional requirements, achieving the desired targets regarding reliability and stability while reducing about 20% of training time after being integrated into the deep learning platform.

    Reference
    Related
    Cited by
Get Citation

高国樑,陈雷放,刘一鸣.面向深度学习的分布式任务执行系统.计算机系统应用,2021,30(7):80-86

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 03,2020
  • Revised:December 02,2020
  • Adopted:
  • Online: July 02,2021
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063