###

计算机系统应用英文版:2024,33(11):79-89

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

优化的协作多智能体强化学习架构

刘玮, 程旭, 李浩源

(南京信息工程大学计算机学院、网络空间安全学院, 南京 210044)

Optimized Architecture for Cooperative Multi-agent Reinforcement Learning

LIU Wei, CHENG Xu, LI Hao-Yuan

(School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 145次下载 941次
Received:February 27, 2024 Revised:May 06, 2024

中文摘要: 在现实环境中, 许多任务需要多个智能体的协作来完成, 然而智能体之间通常存在着通信受限和观察不完整的问题. 深度多智能体强化学习(Deep-MARL)算法在解决这类具有挑战性的场景中表现出卓越的性能. 其中QTRAN和QTRAN++是能够学习一类广泛的联合动作-价值函数的代表性方法, 且同时具备强大的理论保证. 然而, 由于依赖于单一联合动作-价值估计量以及忽视了对智能体观察的预处理, 使得QTRAN和QTRAN++的性能受到了影响. 本文提出了一种称为OPTQTRAN的新算法, 其在QTRAN和QTRAN++的性能基础上取得了显著的提升. 首先, 本文引入了一种双联合动作-价值估计量的结构, 利用一个分解网络模块计算额外的联合动作-价值. 为了确保准确计算联合动作-价值, 本文设计了一个自适应网络模块, 有效促进了值函数学习. 此外, 本文引入了一个多元网络结构, 将智能体的观察分组到不同的单元中, 以有效估计各智能体的效用函数. 在广泛使用的StarCraft基准测试中进行的多场景实验表明, 与最先进的多智能体强化学习方法相比, 本文的方法表现出更卓越的性能.

中文关键词: 强化学习智能博弈多智能体强化学习智能体协作

Abstract:Numerous real-world tasks require the collaboration of multiple agents, often with limited communication and incomplete observations. Deep multi-agent reinforcement learning (Deep-MARL) algorithms show remarkable effectiveness in tackling such challenging scenarios. Among these algorithms, QTRAN and QTRAN++ are representative approaches capable of learning a broad class of joint-action value functions with strong theoretical guarantees. However, the performance of QTRAN and QTRAN++ is hindered by their reliance on a single joint action-value estimator and their neglect of preprocessing agent observations. This study introduces a novel algorithm called OPTQTRAN, which significantly improves upon the performance of QTRAN and QTRAN++. Firstly, the study proposes a dual joint action-value estimator structure that leverages a decomposition network module to compute additional joint action-values. To ensure accurate computation of joint action-value estimators, it designs an adaptive network that facilitates efficient value function learning. Additionally, it introduces a multi-unit network that groups agent observations into different units for effective estimation of utility functions. Extensive experiments conducted on the widely-used StarCraft benchmark across diverse scenarios demonstrate that the proposed approach outperforms state-of-the-art MARL methods.

keywords: reinforcement learning (RL) intelligent game multi-agent reinforcement learning (MARL) agent collaboration

文章编号： 中图分类号： 文献标志码：

基金项目:

Author Name	Affiliation	E-mail
LIU Wei	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China	2821908422@qq.com
CHENG Xu	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
LI Hao-Yuan	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China

Author Name	Affiliation	E-mail
LIU Wei	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China	2821908422@qq.com
CHENG Xu	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
LI Hao-Yuan	School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China

引用文本：
刘玮,程旭,李浩源.优化的协作多智能体强化学习架构.计算机系统应用,2024,33(11):79-89
LIU Wei,CHENG Xu,LI Hao-Yuan.Optimized Architecture for Cooperative Multi-agent Reinforcement Learning.COMPUTER SYSTEMS APPLICATIONS,2024,33(11):79-89