###
计算机系统应用英文版:2023,32(12):43-51
本文二维码信息
码上扫一扫!
融合两级注意力的多机器人强化学习导航
(1.中北大学 计算机科学与技术学院, 太原 030051;2.机器视觉与虚拟现实山西省重点实验室(中北大学), 太原 030051;3.山西省视觉信息处理及智能机器人工程研究中心, 太原 030051)
Multi-robot Reinforcement Learning Navigation Incorporating Two Levels of Attention
(1.School of Computer Science and Technology, North University of China, Taiyuan 030051, China;2.Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China;3.Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 508次   下载 1380
Received:May 25, 2023    Revised:June 26, 2023
中文摘要: 针对多智能体强化学习中因智能体之间的复杂关系所导致的学习效率低及收敛速度慢的问题, 提出基于两级注意力机制的方法MADDPG-Attention, 在MADDPG算法的Critic网络中增加了软硬两级注意力机制, 通过注意力机制学习智能体之间的可借鉴经验, 提升智能体之间的相互学习效率. 由于单层的软注意力机制会给完全不相关的智能体也赋予学习权重, 因此采用硬注意力判断两个智能体之间学习的必要性, 裁减无关信息的智能体, 再用软注意力判断两个智能体间学习的重要性, 按重要性分布来分配学习权重, 据此向有可用经验的智能体学习. 在多智能体粒子的合作导航环境上进行测试, 实验结果表明, MADDPG-Attention算法对复杂关系的理解更为清晰, 在3种环境的导航成功率都达到了90%以上, 有效提高了学习效率, 加快了收敛速度.
Abstract:To solve the low learning efficiency and slow convergence due to the complex relationship among intelligent agents in multi-agent reinforcement learning, this study proposes a two-level attention mechanism based on MADDPG-Attention. The mechanism adds soft and hard attention mechanisms to the Critic network of the MADDPG algorithm and learns the learnable experience among intelligent agents through the attention mechanism to improve the mutual learning efficiency of the agents. Since the single-level soft attention mechanism assigns learning weights to completely irrelevant intelligent agents, hard attention is employed to determine the necessity of learning between two intelligent agents, and the agents with irrelevant information are cut. Then soft attention is adopted to determine the importance of learning between two intelligent agents, and the learning weights are assigned according to the importance distribution to learn from the agents with available experience. Meanwhile, tests on a collaborative navigation environment with multi-agent particles show that the MADDPG-Attention algorithm has a clearer understanding of complex relationships and achieves a success rate of more than 90% in all three environments, which improves the learning efficiency and accelerates the convergence rate.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(62272426,62106238);山西省科技重大专项计划(202201150401021);山西省科技成果转化引导专项(202104021301055);山西省回国留学人员科研资助项目(2020-113);山西省基础研究计划(202203021222027)
Author NameAffiliationE-mail
ZHANG Yao-Dan School of Computer Science and Technology, North University of China, Taiyuan 030051, China  
KUANG Li-Qun School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
kuang@nuc.edu.cn 
JIAO Shi-Chao School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
HAN Hui-Yan School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
XUE Hong-Xin School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
Author NameAffiliationE-mail
ZHANG Yao-Dan School of Computer Science and Technology, North University of China, Taiyuan 030051, China  
KUANG Li-Qun School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
kuang@nuc.edu.cn 
JIAO Shi-Chao School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
HAN Hui-Yan School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
XUE Hong-Xin School of Computer Science and Technology, North University of China, Taiyuan 030051, China
Shanxi Key Laboratory of Machine Vision and Virtual Reality (North University of China), Taiyuan 030051, China
Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China 
 
引用文本:
张耀丹,况立群,焦世超,韩慧妍,薛红新.融合两级注意力的多机器人强化学习导航.计算机系统应用,2023,32(12):43-51
ZHANG Yao-Dan,KUANG Li-Qun,JIAO Shi-Chao,HAN Hui-Yan,XUE Hong-Xin.Multi-robot Reinforcement Learning Navigation Incorporating Two Levels of Attention.COMPUTER SYSTEMS APPLICATIONS,2023,32(12):43-51