Abstract:As the demand for unmanned aerial vehicle (UAV) applications continues to expand, the design of disturbance rejection controllers which aim to ensure that UAVs can complete designated tasks as required has received significant attention. Traditional control algorithms widely used currently exhibit good stability but poor disturbance rejection capability. To address this issue, a hybrid disturbance rejection controller based on an improved twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. This method utilizes nonlinear model predictive control (NMPC) as the base controller and introduces a disturbance compensator based on improved TD3 for hybrid control. This approach combines the advantages of the NMPC controller as well as addresses the shortcomings in disturbance rejection of traditional control algorithms. This study introduces a multi-head attention (MA) mechanism and long short-term memory (LSTM) network into the Actor network of TD3, enhancing TD3’s ability to capture spatial management information and temporal correlation information. Additionally, a continuous logarithmic reward function is introduced to improve training stability and convergence speed, and training is conducted using random task scenarios with random disturbances to enhance model generalization. In experiments, the NMPC-MALSTM-TD3 architecture is compared with architectures using DDPG, SAC, TD3, and PPO algorithms as disturbance compensators. Experimental results demonstrate that the NMPC-MALSTM-TD3 architecture exhibits the most excellent disturbance rejection capabilities and a smaller influence on the stability and real-time performance of NMPC.