基于深度强化学习的四旋翼无人机抗扰控制

doi:10.15888/j.cnki.csa.009675

微信公众号

网站二维码

首页 > 过刊浏览>2024年第33卷第12期 >185-196. DOI:10.15888/j.cnki.csa.009675

PDF HTML阅读 XML下载导出引用引用提醒

基于深度强化学习的四旋翼无人机抗扰控制
DOI:
                        10.15888/j.cnki.csa.009675
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Disturbance Rejection Control of Quadrotor UAVs Based on Deep Reinforcement Learning

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着无人机应用需求不断拓展, 为了保证无人机能够按要求完成预定任务, 抗干扰控制器的设计受到了诸多关注. 目前广泛使用的传统控制算法稳定性较好但抗干扰能力较差. 针对上述问题, 提出了一种基于改进双延迟深度确定性策略梯度(TD3)算法的混合抗干扰控制器, 该方法使用非线性模型预测控制(NMPC)作为基础控制器, 并引入了一个基于改进TD3的干扰补偿器进行混合控制. 该方法结合了NMPC控制器的优点的同时解决了传统控制算法在抗干扰方面的不足. 本文将多头注意力机制(MA)以及长短期记忆网络(LSTM)引入TD3的Actor网络中, 提高了TD3对于空间管理信息以及时间关联信息的捕捉能力, 同时引入一种连续型对数奖励函数来提高训练稳定性和收敛速度, 并使用带随机干扰的随机任务场景进行训练以提高模型泛化性. 在实验中将NMPC-MALSTM-TD3架构与使用DDPG、SAC、TD3、PPO算法作为干扰补偿器的架构进行对比, 实验结果表明, NMPC-MALSTM-TD3架构的综合表现最好, 而且对NMPC的稳定性和实时性影响较小.

Abstract:

As the demand for unmanned aerial vehicle (UAV) applications continues to expand, the design of disturbance rejection controllers which aim to ensure that UAVs can complete designated tasks as required has received significant attention. Traditional control algorithms widely used currently exhibit good stability but poor disturbance rejection capability. To address this issue, a hybrid disturbance rejection controller based on an improved twin delayed deep deterministic (TD3) policy gradient algorithm is proposed. This method utilizes nonlinear model predictive control (NMPC) as the base controller and introduces a disturbance compensator based on improved TD3 for hybrid control. This approach combines the advantages of the NMPC controller as well as addresses the shortcomings in disturbance rejection of traditional control algorithms. This study introduces a multi-head attention (MA) mechanism and long short-term memory (LSTM) network into the Actor network of TD3, enhancing TD3’s ability to capture spatial management information and temporal correlation information. Additionally, a continuous logarithmic reward function is introduced to improve training stability and convergence speed, and training is conducted using random task scenarios with random disturbances to enhance model generalization. In experiments, the NMPC-MALSTM-TD3 architecture is compared with architectures using DDPG, SAC, TD3, and PPO algorithms as disturbance compensators. Experimental results demonstrate that the NMPC-MALSTM-TD3 architecture exhibits the most excellent disturbance rejection capabilities and a smaller influence on the stability and real-time performance of NMPC.

参考文献

相似文献

引证文献

引用本文

徐博洋,时宏伟.基于深度强化学习的四旋翼无人机抗扰控制.计算机系统应用,2024,33(12):185-196

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-04-28
最后修改日期:2024-05-20
录用日期:
在线发布日期: 2024-09-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码