基于值分布的多智能体分布式深度强化学习算法

doi:10.15888/j.cnki.csa.008237

微信公众号

网站二维码

首页 > 过刊浏览>2022年第31卷第1期 >145-151. DOI:10.15888/j.cnki.csa.008237

PDF HTML阅读 XML下载导出引用引用提醒

基于值分布的多智能体分布式深度强化学习算法
DOI:
                        10.15888/j.cnki.csa.008237
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:中国科学技术大学预研基金（YZ2101900004）

Multi-agent Distributed Deep Reinforcement Learning Algorithm Based on Value Distribution

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来深度强化学习在一系列顺序决策问题中取得了巨大的成功，使其为复杂高维的多智能体系统提供有效优化的决策策略成为可能.然而在复杂的多智能体场景中，现有的多智能体深度强化学习算法不仅收敛速度慢，而且算法的稳定性无法保证.本文提出了基于值分布的多智能体分布式深度确定性策略梯度算法（multi-agent distributed distributional deep deterministic policy gradient，MA-D4PG），将值分布的思想引入到多智能体场景中，保留预期回报完整的分布信息，使智能体能够获得更加稳定有效的学习信号；引入多步回报，提高算法的稳定性；引入了分布式数据生成框架将经验数据生成和网络更新解耦，从而可以充分利用计算资源，加快算法的收敛.实验证明，本文提出的算法在多个连续/离散控制的多智能体场景中均具有更好的稳定性和收敛速度，并且智能体的决策能力也得到了明显的增强.

Abstract:

In recent years, deep reinforcement learning has achieved great success in many sequential decision-making problems, which makes it possible to provide effective and optimized decision-making strategies for complex and high-dimensional multi-agent systems. However, in complex multi-agent scenarios, the existing multi-agent deep reinforcement learning algorithm has a low continuous convergence speed, and the stability of the algorithm cannot be guaranteed. Herein, we propose a new multi-agent deep reinforcement learning algorithm, which is called multi-agent distributed distributional deep deterministic policy gradient (MA-D4PG). We adapt the idea of value distribution to multi-agent scenarios and retain the complete distribution information of expected return, so that agents can obtain a more stable and effective learning signal. We also introduce a multi-step return to improve the stability of the algorithm. In addition, we use a distributed data generation framework to decouple empirical data generation and network update for the purpose of taking full advantage of computing resources to speed up the convergence. Experiments show that the proposed method has better stability and a higher convergence speed in multiple continuous/discrete controlled multi-agent scenarios and the decision-making ability of agents has also been significantly enhanced.

参考文献

相似文献

引证文献

引用本文

陈妙云,王雷,盛捷.基于值分布的多智能体分布式深度强化学习算法.计算机系统应用,2022,31(1):145-151

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-03-11
最后修改日期:2021-04-07
录用日期:
在线发布日期: 2021-12-17
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码