双注意力记忆多智能体强化学习

doi:10.15888/j.cnki.csa.009705

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月13日 1:20 星期日

首页 > 过刊浏览>2024年第33卷第12期 >115-122. DOI:10.15888/j.cnki.csa.009705

PDF HTML阅读 XML下载导出引用引用提醒

双注意力记忆多智能体强化学习
DOI:
                        10.15888/j.cnki.csa.009705
                    
CSTR:
                        32024.14.csa.009705
                    
作者:
                        马裕博马裕博
大连海事大学 人工智能学院, 大连 116026
在期刊界中查找
在百度中查找
在本站中查找
周长东周长东
大连海事大学 人工智能学院, 大连 116026
在期刊界中查找
在百度中查找
在本站中查找
张志文张志文
大连海事大学 人工智能学院, 大连 116026
在期刊界中查找
在百度中查找
在本站中查找
杨培泽杨培泽
大连海事大学 人工智能学院, 大连 116026
在期刊界中查找
在百度中查找
在本站中查找
张博张博
大连海事大学 人工智能学院, 大连 116026
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

BiTransformer Memory for Multi-agent Reinforcement Learning

Author:

MA Yu-Bo
MA Yu-Bo
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Chang-Dong
ZHOU Chang-Dong
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Zhi-Wen
ZHANG Zhi-Wen
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Pei-Ze
YANG Pei-Ze
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Bo
ZHANG Bo
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [32]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

多智能体协同在强化学习研究领域占据重要地位, 旨在深入探讨智能体如何通过相互协作实现共同目标. 大部分协作多智能体算法注重合作的构建, 但忽略了个体策略的强化. 为解决上述问题, 本文提出一种BiTransformer记忆(BTM)在线强化学习模型, 该模型不仅考虑多智能体之间的协同, 还利用记忆模块辅助个体决策. BTM由双注意力编码器和双注意力解码器组成, 分别用于个体策略的增强和多智能体系统的协作. 在双注意力编码器中, 受人类的决策经验依赖的启发, 提出记忆注意力模块为当前决策提供历史决策经验. 与传统利用RNN的方法不同, BTM为每一个提供的是一个显式历史决策经验库, 而非隐藏单元. 此外, 提出融合注意力模块, 在历史决策经验的辅助下处理当下的局部观测信息, 从而获取环境中最具决策价值的信息, 进一步提高智能体个体的决策能力. 在双注意力解码器中, 本文提出了决策注意力模块和合作注意力模块两个模块, 通过综合考虑其他已经做出决策智能体与当前智能体的合作收益以及带有历史决策经验的局部观察, 从而促进历史决策辅助下的多智能体潜在合作的形成. 最终本文在星际争霸中的多个场景下对BTM进行了测试, 取得了93%的平均胜率.

关键词:多智能体协同;在线强化学习;局部观测;历史决策经验;合作收益;个体策略增强

Abstract:

Multi-agent collaboration plays a crucial role in the field of reinforcement learning, focusing on how agents cooperate to achieve common goals. Most collaborative multi-agent algorithms emphasize the construction of collaboration but overlook the reinforcement of individual decision-making. To address this issue, this study proposes an online reinforcement learning model, BiTransformer memory (BTM), which not only considers the collaboration among multiple agents but also uses a memory module to assist individual decision-making. The BTM model is composed of a BiTransformer encoder and a BiTransformer decoder, which are utilized to improve individual decision-making and collaboration within the multi-agent system, respectively. Inspired by human reliance on historical decision-making experience, the BiTransformer encoder introduces a memory attention module to aid current decisions with a library of explicit historical decision-making experience rather than hidden units, differing from the conventional RNN-based method. Additionally, an attention fusion module is proposed to process partial observations with the assistance of historical decision experience, to obtain the most valuable information for decision-making from the environment, thereby enhancing the decision-making capabilities of individual agents. In the BiTransformer decoder, two modules are proposed: a decision attention module and a collaborative attention module. They are used to foster potential cooperation among agents by considering the collaborative benefits between other decision-making agents and the current agent, as well as partial observations with historical decision-making experience. BTM is tested in multiple scenes of StarCraft, achieving an average win rate of 93%.

Key words:multi-agent collaboration;online reinforcement learning;partial observation;historical decision-making experience;collaborative benefit;individual policy enhancement

参考文献

[1] 林谦, 余超, 伍夏威, 等. 面向机器人系统的虚实迁移强化学习综述. 软件学报, 2024, 35(2): 711–738.

[2] Li SE. Deep reinforcement learning. Reinforcement Learning for Sequential Decision and Optimal Control. Singapore: Springer, 2023. 365–402.

[3] 丁世飞, 杜威, 张健, 等. 多智能体深度强化学习研究进展. 计算机学报, 2024, 47(7): 1547–1567.

[4] Guo J, Chen YH, Hao YH, et al. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022. 114–121.

[5] Avalos R. Exploration and communication for partially observable collaborative multi-agent reinforcement learning. Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. New Zealand: International Foundation for Autonomous Agents and Multiagent Systems, 2022. 1829–1832.

[6] Oroojlooy A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53(11): 13677–13722.

[7] Sharma PK, Fernandez R, Zaroukian E, et al. Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training. Proceedings of the 2021 Conference on Artificial Intelligence and Machine Learning for Multi-domain Operations Applications III. SPIE, 2021. 117462K.

[8] Zhang KQ, Yang ZR, Başar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In: Vamvoudakis KG, Wan Y, Lewis FL, et al., eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021. 321–384.

[9] 李茹杨, 彭慧民, 李仁刚, 等. 强化学习算法与应用综述. 计算机系统应用, 2020, 29(12): 13–25.

[10] Li C, Wang T, Wu C, et al. Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 2021, 34: 3991–4002.

[11] 周毅, 刘俊. 融合强化学习的多目标路径规划. 计算机系统应用, 2024, 33(3): 158–169.

[12] Peng CY, Kim M, Zhang Z, et al. VDN: Virtual machine image distribution network for cloud data centers. Proceedings of the 2012 IEEE INFOCOM. Orlando: IEEE, 2012. 181–189.

[13] Wen MN, Kuba JG, Lin RJ, et al. Multi-agent reinforcement learning is a sequence modeling problem. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 1201.

[14] Jaakkola T, Singh SP, Jordan MI. Reinforcement learning algorithm for partially observable Markov decision problems. Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver: MIT Press, 1994. 345–352.

[15] Kwon D, Jeon J, Park S, et al. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. IEEE Internet of Things Journal, 2020, 7(10): 9895–9903.

[16] Shakya AK, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications, 2023, 231: 120495.

[17] Gallici M, Martin M, Masmitja I. TransfQMix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. London: International Foundation for Autonomous Agents and Multiagent Systems, 2023. 1679–1687.

[18] Yu LL, Li KY, Huo SX, et al. Cooperative offensive decision-making for soccer robots based on bi-channel Q-value evaluation MADDPG. Engineering Applications of Artificial Intelligence, 2023, 121: 105994.

[19] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. 2974–2982.

[20] Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative multi-agent games. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 1787.

[21] Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: ICML, 2019. 2961–2970.

[22] 陈妙云, 王雷, 盛捷. 基于值分布的多智能体分布式深度强化学习算法. 计算机系统应用, 2022, 31(1): 145–151.

[23] Son K, Kim D, Kang WJ, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning. Long Beach: ICML, 2019. 5887–5896.

[24] Rashid T, Samvelyan M, De Witt CS, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 2020, 21(1): 178.

[25] 马佩鑫, 程钰, 侯健, 等. 基于多智能体深度强化学习的协作导航应用. 计算机系统应用, 2023, 32(8): 95–104.

[26] Geng MH. Scaling up cooperative multi-agent reinforcement learning systems. Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems. Auckland: International Foundation for Autonomous Agents and Multiagent Systems, 2024. 2737–2739.

[27] Shen GC, Wang Y. Review on Dec-POMDP model for MARL algorithms. In: Jain JC, Kountchev R, Hu B, et al., eds. Smart Communications, Intelligent Algorithms and Interactive Methods. Singapore: Springer, 2022. 29–35.

[28] Zhang Z, Ong YS, Wang DQ, et al. A collaborative multiagent reinforcement learning method based on policy gradient potential. IEEE Transactions on Cybernetics, 2021, 51(2): 1015–1027.

[29] Lauri M, Hsu D, Pajarinen J. Partially observable Markov decision processes in robotics: A survey. IEEE Transactions on Robotics, 2023, 39(1): 21–40.

[30] Raileanu R, Fergus R. Decoupling value and policy for generalization in reinforcement learning. Proceedings of the 38th International Conference on Machine Learning. Berlin: ICML, 2021. 8787–8798.

[31] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.

[32] Samvelyan M, Rashid T, de Witt CS, et al. The StarCraft multi-agent challenge. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems. Montreal: AAMAS, 2019. 2186–2188.

引用本文

马裕博,周长东,张志文,杨培泽,张博.双注意力记忆多智能体强化学习.计算机系统应用,2024,33(12):115-122

复制

文章指标

点击次数:114
下载次数: 447
HTML阅读次数: 284
引用次数: 0

历史

收稿日期:2024-05-22
最后修改日期:2024-06-17
录用日期:
在线发布日期: 2024-10-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码