基于改进深度Q网络的移动机器人路径规划算法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金 (62372343)


Mobile Robot Path Planning Based on Improved Deep Q-network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着自动化技术和机器人领域的快速发展, 移动机器人路径规划的精确性要求日益提高. 针对深度强化学习在复杂环境下路径规划存在的收敛稳定性差、样本效率低及环境适应性不足等问题, 提出了一种改进的基于决斗深度双Q网络的路径规划算法(R-D3QN). 通过构建双网络架构解耦动作选择与价值估计过程, 有效缓解Q值过估计问题, 提高收敛稳定性; 设计时序优先经验回放机制, 结合长短期记忆网络(LSTM)的时空特征提取能力, 改进样本利用效率; 提出基于模拟退火的多阶段探索策略, 平衡了探索与利用, 增强环境适应性. 实验结果表明, 与传统DQN算法相比, R-D3QN算法在简单环境下平均奖励值提高了9.25%, 收敛次数减少了24.39%, 碰撞次数减少了41.20%; 在复杂环境下, 平均奖励值提升了12.98%, 收敛次数减少了11.86%, 碰撞次数减少了42.14%. 同时与其他改进的DQN算法对比也具有明显的优势, 验证了所提算法的有效性.

    Abstract:

    The rapid advancement of automation technology and robotics requires more precision in mobile robot path planning. To address the problems of poor convergence stability, low sample efficiency, and insufficient environmental adaptability in deep reinforcement learning for path planning in complex environments, this paper proposes an enhanced path planning algorithm based on dueling double deep Q-network (R-D3QN). By constructing a dual-network architecture to decouple the action selection and value estimation processes, this method effectively alleviates the Q-value overestimation problem, thereby improving convergence stability. In addition, this method designs a temporal-prioritized experience replay mechanism combined with the spatiotemporal feature extraction capabilities of long short-term memory (LSTM) networks to improve sample utilization efficiency. Finally, this method proposes a multi-stage exploration strategy based on simulated annealing to balance exploration and exploitation, thereby enhancing environmental adaptability. Experimental results demonstrate that, compared to the traditional DQN algorithm, the R-D3QN algorithm achieves a 9.25% increase in average reward value, a 24.39% reduction in convergence iterations, and a 41.20% decrease in collision frequency in simple environments. In complex environments, it shows a 12.98% increase in average reward value, an 11.86% reduction in convergence iterations, and a 42.14% decrease in collision frequency. Furthermore, the effectiveness of the proposed algorithm is validated when compared with other enhanced DQN algorithms.

    参考文献
    相似文献
    引证文献
引用本文

谢天,周毅,邱宇峰.基于改进深度Q网络的移动机器人路径规划算法.计算机系统应用,,():1-11

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-18
  • 最后修改日期:2025-02-11
  • 录用日期:
  • 在线发布日期: 2025-05-23
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号