Abstract:Considering the internal driving mechanism of behavior decision-making and state changes of multiple UAVs, a collaborative trajectory planning method based on decision-making knowledge learning is proposed from the perspective of information processing. Firstly, the behavior states of UAVs are represented by knowledge on the basis of the Markov decision process, and the decision-making knowledge on continuous action space is developed. Then, a deep deterministic policy gradient (DDPG) algorithm based on decision-making knowledge learning is presented to achieve the collaborative planning of UAVs on the decision-making knowledge level. The experimental results reveal that on the basis of developing a demonstration system, the method can obtain an optimal trajectory planning strategy by reinforcement learning and can simultaneously achieve the convergence and stability of the comprehensive evaluation and average reward of trajectories, which provides decision-making support for mission execution of UAVs.