本文已被:浏览 17次 下载 467次
Received:April 28, 2024 Revised:May 20, 2024
Received:April 28, 2024 Revised:May 20, 2024
中文摘要: 随着无人机应用需求不断拓展, 为了保证无人机能够按要求完成预定任务, 抗干扰控制器的设计受到了诸多关注. 目前广泛使用的传统控制算法稳定性较好但抗干扰能力较差. 针对上述问题, 提出了一种基于改进双延迟深度确定性策略梯度(TD3)算法的混合抗干扰控制器, 该方法使用非线性模型预测控制(NMPC)作为基础控制器, 并引入了一个基于改进TD3的干扰补偿器进行混合控制. 该方法结合了NMPC控制器的优点的同时解决了传统控制算法在抗干扰方面的不足. 本文将多头注意力机制(MA)以及长短期记忆网络(LSTM)引入TD3的Actor网络中, 提高了TD3对于空间管理信息以及时间关联信息的捕捉能力, 同时引入一种连续型对数奖励函数来提高训练稳定性和收敛速度, 并使用带随机干扰的随机任务场景进行训练以提高模型泛化性. 在实验中将NMPC-MALSTM-TD3架构与使用DDPG、SAC、TD3、PPO算法作为干扰补偿器的架构进行对比, 实验结果表明, NMPC-MALSTM-TD3架构的综合表现最好, 而且对NMPC的稳定性和实时性影响较小.
Abstract:As the demand for unmanned aerial vehicle (UAV) applications continues to expand, the design of disturbance rejection controllers which aim to ensure that UAVs can complete designated tasks as required has received significant attention. Traditional control algorithms widely used currently exhibit good stability but poor disturbance rejection capability. To address this issue, a hybrid disturbance rejection controller based on an improved twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. This method utilizes nonlinear model predictive control (NMPC) as the base controller and introduces a disturbance compensator based on improved TD3 for hybrid control. This approach combines the advantages of the NMPC controller as well as addresses the shortcomings in disturbance rejection of traditional control algorithms. This study introduces a multi-head attention (MA) mechanism and long short-term memory (LSTM) network into the Actor network of TD3, enhancing TD3’s ability to capture spatial management information and temporal correlation information. Additionally, a continuous logarithmic reward function is introduced to improve training stability and convergence speed, and training is conducted using random task scenarios with random disturbances to enhance model generalization. In experiments, the NMPC-MALSTM-TD3 architecture is compared with architectures using DDPG, SAC, TD3, and PPO algorithms as disturbance compensators. Experimental results demonstrate that the NMPC-MALSTM-TD3 architecture exhibits the most excellent disturbance rejection capabilities and a smaller influence on the stability and real-time performance of NMPC.
keywords: deep reinforcement learning nonlinear model predictive control TD3 multi-head attention LSTM
文章编号: 中图分类号: 文献标志码:
基金项目:
Author Name | Affiliation | |
XU Bo-Yang | College of Computer Science, Sichuan University, Chengdu 610065, China | |
SHI Hong-Wei | College of Computer Science, Sichuan University, Chengdu 610065, China | shihw001@126.com |
Author Name | Affiliation | |
XU Bo-Yang | College of Computer Science, Sichuan University, Chengdu 610065, China | |
SHI Hong-Wei | College of Computer Science, Sichuan University, Chengdu 610065, China | shihw001@126.com |
引用文本:
徐博洋,时宏伟.基于深度强化学习的四旋翼无人机抗扰控制.计算机系统应用,,():1-12
XU Bo-Yang,SHI Hong-Wei.Disturbance Rejection Control of Quadrotor UAVs Based on Deep Reinforcement Learning.COMPUTER SYSTEMS APPLICATIONS,,():1-12
徐博洋,时宏伟.基于深度强化学习的四旋翼无人机抗扰控制.计算机系统应用,,():1-12
XU Bo-Yang,SHI Hong-Wei.Disturbance Rejection Control of Quadrotor UAVs Based on Deep Reinforcement Learning.COMPUTER SYSTEMS APPLICATIONS,,():1-12