融合两级注意力的多机器人强化学习导航
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62272426,62106238);山西省科技重大专项计划(202201150401021);山西省科技成果转化引导专项(202104021301055);山西省回国留学人员科研资助项目(2020-113);山西省基础研究计划(202203021222027)


Multi-robot Reinforcement Learning Navigation Incorporating Two Levels of Attention
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对多智能体强化学习中因智能体之间的复杂关系所导致的学习效率低及收敛速度慢的问题, 提出基于两级注意力机制的方法MADDPG-Attention, 在MADDPG算法的Critic网络中增加了软硬两级注意力机制, 通过注意力机制学习智能体之间的可借鉴经验, 提升智能体之间的相互学习效率. 由于单层的软注意力机制会给完全不相关的智能体也赋予学习权重, 因此采用硬注意力判断两个智能体之间学习的必要性, 裁减无关信息的智能体, 再用软注意力判断两个智能体间学习的重要性, 按重要性分布来分配学习权重, 据此向有可用经验的智能体学习. 在多智能体粒子的合作导航环境上进行测试, 实验结果表明, MADDPG-Attention算法对复杂关系的理解更为清晰, 在3种环境的导航成功率都达到了90%以上, 有效提高了学习效率, 加快了收敛速度.

    Abstract:

    To solve the low learning efficiency and slow convergence due to the complex relationship among intelligent agents in multi-agent reinforcement learning, this study proposes a two-level attention mechanism based on MADDPG-Attention. The mechanism adds soft and hard attention mechanisms to the Critic network of the MADDPG algorithm and learns the learnable experience among intelligent agents through the attention mechanism to improve the mutual learning efficiency of the agents. Since the single-level soft attention mechanism assigns learning weights to completely irrelevant intelligent agents, hard attention is employed to determine the necessity of learning between two intelligent agents, and the agents with irrelevant information are cut. Then soft attention is adopted to determine the importance of learning between two intelligent agents, and the learning weights are assigned according to the importance distribution to learn from the agents with available experience. Meanwhile, tests on a collaborative navigation environment with multi-agent particles show that the MADDPG-Attention algorithm has a clearer understanding of complex relationships and achieves a success rate of more than 90% in all three environments, which improves the learning efficiency and accelerates the convergence rate.

    参考文献
    相似文献
    引证文献
引用本文

张耀丹,况立群,焦世超,韩慧妍,薛红新.融合两级注意力的多机器人强化学习导航.计算机系统应用,2023,32(12):43-51

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-05-25
  • 最后修改日期:2023-06-26
  • 录用日期:
  • 在线发布日期: 2023-09-19
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号