优化的协作多智能体强化学习架构
作者:

Optimized Architecture for Cooperative Multi-agent Reinforcement Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [23]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在现实环境中, 许多任务需要多个智能体的协作来完成, 然而智能体之间通常存在着通信受限和观察不完整的问题. 深度多智能体强化学习(Deep-MARL)算法在解决这类具有挑战性的场景中表现出卓越的性能. 其中QTRAN和QTRAN++是能够学习一类广泛的联合动作-价值函数的代表性方法, 且同时具备强大的理论保证. 然而, 由于依赖于单一联合动作-价值估计量以及忽视了对智能体观察的预处理, 使得QTRAN和QTRAN++的性能受到了影响. 本文提出了一种称为OPTQTRAN的新算法, 其在QTRAN和QTRAN++的性能基础上取得了显著的提升. 首先, 本文引入了一种双联合动作-价值估计量的结构, 利用一个分解网络模块计算额外的联合动作-价值. 为了确保准确计算联合动作-价值, 本文设计了一个自适应网络模块, 有效促进了值函数学习. 此外, 本文引入了一个多元网络结构, 将智能体的观察分组到不同的单元中, 以有效估计各智能体的效用函数. 在广泛使用的StarCraft基准测试中进行的多场景实验表明, 与最先进的多智能体强化学习方法相比, 本文的方法表现出更卓越的性能.

    Abstract:

    Numerous real-world tasks require the collaboration of multiple agents, often with limited communication and incomplete observations. Deep multi-agent reinforcement learning (Deep-MARL) algorithms show remarkable effectiveness in tackling such challenging scenarios. Among these algorithms, QTRAN and QTRAN++ are representative approaches capable of learning a broad class of joint-action value functions with strong theoretical guarantees. However, the performance of QTRAN and QTRAN++ is hindered by their reliance on a single joint action-value estimator and their neglect of preprocessing agent observations. This study introduces a novel algorithm called OPTQTRAN, which significantly improves upon the performance of QTRAN and QTRAN++. Firstly, the study proposes a dual joint action-value estimator structure that leverages a decomposition network module to compute additional joint action-values. To ensure accurate computation of joint action-value estimators, it designs an adaptive network that facilitates efficient value function learning. Additionally, it introduces a multi-unit network that groups agent observations into different units for effective estimation of utility functions. Extensive experiments conducted on the widely-used StarCraft benchmark across diverse scenarios demonstrate that the proposed approach outperforms state-of-the-art MARL methods.

    参考文献
    [1] Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156–172.
    [2] Yogeswaran M, Ponnambalam SG, Kanagaraj G. Reinforcement learning in swarm-robotics for multi-agent foraging-task domain. Proceedings of the 2013 IEEE Symposium on Swarm Intelligence. Singapore: IEEE, 2013. 15–21.
    [3] Cao YC, Yu WW, Ren W, et al. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 2013, 9(1): 427–438.
    [4] Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. Stockholm, 2018. 2085–2087.
    [5] Rashid T, Samvelyan M, De Witt CS, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 2020, 21(178): 1–51.
    [6] Son K, Kim D, Kang WJ, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 5887–5896.
    [7] Samvelyan M, Rashid T, de Witt CS, et al. The StarCraft multi-agent challenge. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems. Montreal, 2019. 2186–2188.
    [8] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. 2974–2982.
    [9] Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6382–6393.
    [10] Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. Proceedings of the 4th International Conference on Learning Representations. San Juan: ICLR, 2016.
    [11] Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 2961–2970.
    [12] Wang TH, Dong H, Lesser V, et al. ROMA: Multi-agent reinforcement learning with emergent roles. Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020. 9876–9886.
    [13] Zhang TH, Li YH, Wang C, et al. FOP: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 12491–12500.
    [14] Wang L, Zhang YP, Hu YJ, et al. Individual reward assisted multi-agent reinforcement learning. Proceedings of the 39th International Conference on Machine Learning. Baltimore: PMLR, 2022. 23417–23432.
    [15] Wang JH, Ren ZZ, Liu T, et al. QPLEX: Duplex dueling multi-agent Q-learning. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
    [16] Wang TH, Gupta T, Mahajan A, et al. RODE: Learning roles to decompose multi-agent tasks. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
    [17] Zhou HH, Lan T, Aggarwal V. PAC: Assisted value factorisation with counterfactual predictions in multi-agent reinforcement learning. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1146.
    [18] Mahajan A, Rashid T, Samvelyan M, et al. MAVEN: Multi-agent variational exploration. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 684.
    [19] Yang YD, Wen Y, Wang J, et al. Multi-agent determinantal Q-learning. Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020. 10757–10766.
    [20] Rashid T, Farquhar G, Peng B, et al. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 855.
    [21] Pan L, Rashid T, Peng B, et al. Regularized Softmax deep multi-agent Q-learning. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021. 1365–1377.
    [22] Ha D, Dai AM, Le QV. Hypernetworks. Proceedings of the 5th International Conference on Learning Representations. Toulon: OpenReview.net, 2017.
    [23] Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable MDPs. Proceedings of the 2015 AAAI Fall Symposia. Arlington: AAAI, 2015. 29–37.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

刘玮,程旭,李浩源.优化的协作多智能体强化学习架构.计算机系统应用,2024,33(11):79-89

复制
分享
文章指标
  • 点击次数:231
  • 下载次数: 1067
  • HTML阅读次数: 716
  • 引用次数: 0
历史
  • 收稿日期:2024-02-27
  • 最后修改日期:2024-05-06
  • 在线发布日期: 2024-09-24
文章二维码
您是第11465168位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号