双注意力记忆多智能体强化学习
作者:

BiTransformer Memory for Multi-agent Reinforcement Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    多智能体协同在强化学习研究领域占据重要地位, 旨在深入探讨智能体如何通过相互协作实现共同目标. 大部分协作多智能体算法注重合作的构建, 但忽略了个体策略的强化. 为解决上述问题, 本文提出一种BiTransformer记忆(BTM)在线强化学习模型, 该模型不仅考虑多智能体之间的协同, 还利用记忆模块辅助个体决策. BTM由双注意力编码器和双注意力解码器组成, 分别用于个体策略的增强和多智能体系统的协作. 在双注意力编码器中, 受人类的决策经验依赖的启发, 提出记忆注意力模块为当前决策提供历史决策经验. 与传统利用RNN的方法不同, BTM为每一个提供的是一个显式历史决策经验库, 而非隐藏单元. 此外, 提出融合注意力模块, 在历史决策经验的辅助下处理当下的局部观测信息, 从而获取环境中最具决策价值的信息, 进一步提高智能体个体的决策能力. 在双注意力解码器中, 本文提出了决策注意力模块和合作注意力模块两个模块, 通过综合考虑其他已经做出决策智能体与当前智能体的合作收益以及带有历史决策经验的局部观察, 从而促进历史决策辅助下的多智能体潜在合作的形成. 最终本文在星际争霸中的多个场景下对BTM进行了测试, 取得了93%的平均胜率.

    Abstract:

    Multi-agent collaboration plays a crucial role in the field of reinforcement learning, focusing on how agents cooperate to achieve common goals. Most collaborative multi-agent algorithms emphasize the construction of collaboration but overlook the reinforcement of individual decision-making. To address this issue, this study proposes an online reinforcement learning model, BiTransformer memory (BTM), which not only considers the collaboration among multiple agents but also uses a memory module to assist individual decision-making. The BTM model is composed of a BiTransformer encoder and a BiTransformer decoder, which are utilized to improve individual decision-making and collaboration within the multi-agent system, respectively. Inspired by human reliance on historical decision-making experience, the BiTransformer encoder introduces a memory attention module to aid current decisions with a library of explicit historical decision-making experience rather than hidden units, differing from the conventional RNN-based method. Additionally, an attention fusion module is proposed to process partial observations with the assistance of historical decision experience, to obtain the most valuable information for decision-making from the environment, thereby enhancing the decision-making capabilities of individual agents. In the BiTransformer decoder, two modules are proposed: a decision attention module and a collaborative attention module. They are used to foster potential cooperation among agents by considering the collaborative benefits between other decision-making agents and the current agent, as well as partial observations with historical decision-making experience. BTM is tested in multiple scenes of StarCraft, achieving an average win rate of 93%.

    参考文献
    [1] 林谦, 余超, 伍夏威, 等. 面向机器人系统的虚实迁移强化学习综述. 软件学报, 2024, 35(2): 711–738.
    [2] Li SE. Deep reinforcement learning. Reinforcement Learning for Sequential Decision and Optimal Control. Singapore: Springer, 2023. 365–402.
    [3] 丁世飞, 杜威, 张健, 等. 多智能体深度强化学习研究进展. 计算机学报, 2024, 47(7): 1547–1567.
    [4] Guo J, Chen YH, Hao YH, et al. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022. 114–121.
    [5] Avalos R. Exploration and communication for partially observable collaborative multi-agent reinforcement learning. Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. New Zealand: International Foundation for Autonomous Agents and Multiagent Systems, 2022. 1829–1832.
    [6] Oroojlooy A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53(11): 13677–13722.
    [7] Sharma PK, Fernandez R, Zaroukian E, et al. Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training. Proceedings of the 2021 Conference on Artificial Intelligence and Machine Learning for Multi-domain Operations Applications III. SPIE, 2021. 117462K.
    [8] Zhang KQ, Yang ZR, Başar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In: Vamvoudakis KG, Wan Y, Lewis FL, et al., eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021. 321–384.
    [9] 李茹杨, 彭慧民, 李仁刚, 等. 强化学习算法与应用综述. 计算机系统应用, 2020, 29(12): 13–25.
    [10] Li C, Wang T, Wu C, et al. Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 2021, 34: 3991–4002.
    [11] 周毅, 刘俊. 融合强化学习的多目标路径规划. 计算机系统应用, 2024, 33(3): 158–169.
    [12] Peng CY, Kim M, Zhang Z, et al. VDN: Virtual machine image distribution network for cloud data centers. Proceedings of the 2012 IEEE INFOCOM. Orlando: IEEE, 2012. 181–189.
    [13] Wen MN, Kuba JG, Lin RJ, et al. Multi-agent reinforcement learning is a sequence modeling problem. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 1201.
    [14] Jaakkola T, Singh SP, Jordan MI. Reinforcement learning algorithm for partially observable Markov decision problems. Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver: MIT Press, 1994. 345–352.
    [15] Kwon D, Jeon J, Park S, et al. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. IEEE Internet of Things Journal, 2020, 7(10): 9895–9903.
    [16] Shakya AK, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications, 2023, 231: 120495.
    [17] Gallici M, Martin M, Masmitja I. TransfQMix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. London: International Foundation for Autonomous Agents and Multiagent Systems, 2023. 1679–1687.
    [18] Yu LL, Li KY, Huo SX, et al. Cooperative offensive decision-making for soccer robots based on bi-channel Q-value evaluation MADDPG. Engineering Applications of Artificial Intelligence, 2023, 121: 105994.
    [19] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. 2974–2982.
    [20] Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative multi-agent games. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1787.
    [21] Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: ICML, 2019. 2961–2970.
    [22] 陈妙云, 王雷, 盛捷. 基于值分布的多智能体分布式深度强化学习算法. 计算机系统应用, 2022, 31(1): 145–151.
    [23] Son K, Kim D, Kang WJ, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: ICML, 2019. 5887–5896.
    [24] Rashid T, Samvelyan M, De Witt CS, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 2020, 21(1): 178.
    [25] 马佩鑫, 程钰, 侯健, 等. 基于多智能体深度强化学习的协作导航应用. 计算机系统应用, 2023, 32(8): 95–104.
    [26] Geng MH. Scaling up cooperative multi-agent reinforcement learning systems. Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems. Auckland: International Foundation for Autonomous Agents and Multiagent Systems, 2024. 2737–2739.
    [27] Shen GC, Wang Y. Review on Dec-POMDP model for MARL algorithms. In: Jain JC, Kountchev R, Hu B, et al., eds. Smart Communications, Intelligent Algorithms and Interactive Methods. Singapore: Springer, 2022. 29–35.
    [28] Zhang Z, Ong YS, Wang DQ, et al. A collaborative multiagent reinforcement learning method based on policy gradient potential. IEEE Transactions on Cybernetics, 2021, 51(2): 1015–1027.
    [29] Lauri M, Hsu D, Pajarinen J. Partially observable Markov decision processes in robotics: A survey. IEEE Transactions on Robotics, 2023, 39(1): 21–40.
    [30] Raileanu R, Fergus R. Decoupling value and policy for generalization in reinforcement learning. Proceedings of the 38th International Conference on Machine Learning. Berlin: ICML, 2021. 8787–8798.
    [31] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [32] Samvelyan M, Rashid T, de Witt CS, et al. The StarCraft multi-agent challenge. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems. Montreal: AAMAS, 2019. 2186–2188.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

马裕博,周长东,张志文,杨培泽,张博.双注意力记忆多智能体强化学习.计算机系统应用,2024,33(12):115-122

复制
分享
文章指标
  • 点击次数:114
  • 下载次数: 447
  • HTML阅读次数: 284
  • 引用次数: 0
历史
  • 收稿日期:2024-05-22
  • 最后修改日期:2024-06-17
  • 在线发布日期: 2024-10-31
文章二维码
您是第11339793位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号