本文已被:浏览 1121次 下载 2600次
Received:March 11, 2021 Revised:April 07, 2021
Received:March 11, 2021 Revised:April 07, 2021
中文摘要: 近年来深度强化学习在一系列顺序决策问题中取得了巨大的成功,使其为复杂高维的多智能体系统提供有效优化的决策策略成为可能.然而在复杂的多智能体场景中,现有的多智能体深度强化学习算法不仅收敛速度慢,而且算法的稳定性无法保证.本文提出了基于值分布的多智能体分布式深度确定性策略梯度算法(multi-agent distributed distributional deep deterministic policy gradient,MA-D4PG),将值分布的思想引入到多智能体场景中,保留预期回报完整的分布信息,使智能体能够获得更加稳定有效的学习信号;引入多步回报,提高算法的稳定性;引入了分布式数据生成框架将经验数据生成和网络更新解耦,从而可以充分利用计算资源,加快算法的收敛.实验证明,本文提出的算法在多个连续/离散控制的多智能体场景中均具有更好的稳定性和收敛速度,并且智能体的决策能力也得到了明显的增强.
Abstract:In recent years, deep reinforcement learning has achieved great success in many sequential decision-making problems, which makes it possible to provide effective and optimized decision-making strategies for complex and high-dimensional multi-agent systems. However, in complex multi-agent scenarios, the existing multi-agent deep reinforcement learning algorithm has a low continuous convergence speed, and the stability of the algorithm cannot be guaranteed. Herein, we propose a new multi-agent deep reinforcement learning algorithm, which is called multi-agent distributed distributional deep deterministic policy gradient (MA-D4PG). We adapt the idea of value distribution to multi-agent scenarios and retain the complete distribution information of expected return, so that agents can obtain a more stable and effective learning signal. We also introduce a multi-step return to improve the stability of the algorithm. In addition, we use a distributed data generation framework to decouple empirical data generation and network update for the purpose of taking full advantage of computing resources to speed up the convergence. Experiments show that the proposed method has better stability and a higher convergence speed in multiple continuous/discrete controlled multi-agent scenarios and the decision-making ability of agents has also been significantly enhanced.
keywords: multi-agent deep reinforcement learning value distribution multi-step return distributed data generation
文章编号: 中图分类号: 文献标志码:
基金项目:中国科学技术大学预研基金(YZ2101900004)
引用文本:
陈妙云,王雷,盛捷.基于值分布的多智能体分布式深度强化学习算法.计算机系统应用,2022,31(1):145-151
CHEN Miao-Yun,WANG Lei,SHENG Jie.Multi-agent Distributed Deep Reinforcement Learning Algorithm Based on Value Distribution.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):145-151
陈妙云,王雷,盛捷.基于值分布的多智能体分布式深度强化学习算法.计算机系统应用,2022,31(1):145-151
CHEN Miao-Yun,WANG Lei,SHENG Jie.Multi-agent Distributed Deep Reinforcement Learning Algorithm Based on Value Distribution.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):145-151