基于策略优化和表征搜索的改进多智能体进化强化学习
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Improved Multi-agent Evolutionary Reinforcement Learning with Strategy Optimization and Representation Search
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多智能体进化强化学习将进化算法融入多智能体强化学习中, 缓解了多智能体强化学习固有的低质量奖励信号和非平稳性等问题. 然而, 现有工作中强化学习和进化算法之间的学习与探索通常难以平衡, 一方面强化学习中的较差策略会对种群造成潜在的破坏性影响, 另一方面种群中高质量策略的低利用率限制了整体的学习效率. 此外, 在复杂的部分可观测环境中, 智能体难以实现有效的观测表征, 降低了智能体的决策准确性. 为了解决上述问题, 本文提出了一种基于策略优化和表征搜索的改进多智能体进化强化学习方法(improved multi-agent evolutionary reinforcement learning with strategy optimization and representation search, SORS). 首先, 针对学习与探索的平衡问题, 本文设计了一个奖励驱动的策略优化模块, 使用优势策略来指导进化算法的种群变异和强化学习的梯度优化. 其次, 针对复杂环境中的部分可观测性问题, 引入了一个表征搜索方法, 通过添加扰动的表征网络种群来搜索更好的表征, 优化了智能体在复杂环境中的观测表征. 最后, 在星际争霸仿真平台上对提出的方法进行了实验验证, 实验结果表明SORS具有卓越的性能, 在不同环境的平均胜率超过了所有的基线算法.

    Abstract:

    Multi-agent evolutionary reinforcement learning integrates evolutionary algorithms into multi-agent reinforcement learning, addressing inherent problems such as low-quality reward signals and non-stationarity. However, existing methods often struggle to balance learning and exploration between reinforcement learning and evolutionary algorithms. On one hand, suboptimal strategies in reinforcement learning can negatively affect the population. On the other hand, the low utilization of high-quality strategies within the population limits overall learning efficiency. Moreover, in complex partially observable environments, agents face challenges in obtaining effective observation representations, which reduces decision-making accuracy. To address these challenges, this study proposes an improved multi-agent evolutionary reinforcement learning method based on strategy optimization and representation search (SORS). First, to tackle the balance between learning and exploration, a reward-driven strategy optimization module is designed, utilizing superior strategies to guide population mutation in evolutionary algorithms and gradient optimization in reinforcement learning. Second, to mitigate partial observability in complex environments, a representation search method is introduced, enhancing agents’ observation representations by perturbing representation network populations. Finally, experiments conducted on a StarCraft simulation platform validate the proposed method. The results show that SORS achieves superior performance, surpassing all baseline algorithms in average win rates across different environments.

    参考文献
    相似文献
    引证文献
引用本文

陈洪放,王秋红,顾晶晶,张凯.基于策略优化和表征搜索的改进多智能体进化强化学习.计算机系统应用,2025,34(12):26-38

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-28
  • 最后修改日期:2025-05-26
  • 录用日期:
  • 在线发布日期: 2025-10-31
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62661041 传真: Email:csa@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号