Abstract:In recent years, deep reinforcement learning has achieved great success in many sequential decision-making problems, which makes it possible to provide effective and optimized decision-making strategies for complex and high-dimensional multi-agent systems. However, in complex multi-agent scenarios, the existing multi-agent deep reinforcement learning algorithm has a low continuous convergence speed, and the stability of the algorithm cannot be guaranteed. Herein, we propose a new multi-agent deep reinforcement learning algorithm, which is called multi-agent distributed distributional deep deterministic policy gradient (MA-D4PG). We adapt the idea of value distribution to multi-agent scenarios and retain the complete distribution information of expected return, so that agents can obtain a more stable and effective learning signal. We also introduce a multi-step return to improve the stability of the algorithm. In addition, we use a distributed data generation framework to decouple empirical data generation and network update for the purpose of taking full advantage of computing resources to speed up the convergence. Experiments show that the proposed method has better stability and a higher convergence speed in multiple continuous/discrete controlled multi-agent scenarios and the decision-making ability of agents has also been significantly enhanced.