###
计算机系统应用英文版:2023,32(4):293-299
本文二维码信息
码上扫一扫!
基于BN-DDPG轻量级强化学习算法的智能兵棋推演
(南京航空航天大学 计算机科学与技术学院, 南京 211106)
Intelligent Wargame Deduction Based on BN-DDPG Lightweight Reinforcement Learning Algorithm
(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 388次   下载 859
Received:August 25, 2022    Revised:September 27, 2022
中文摘要: 兵棋推演与智能算法融合成为当前军事应用领域的研究热点, 利用深度强化学习技术实现仿真推演中决策过程的智能化, 可显著减少人为经验对决策过程的影响, 提高推演效率和灵活性. 现有基于DRL算法的决策模型, 其训练时间过长, 算力开销过大, 无法满足作战任务的实时性需求. 本文提出一种基于轻量级深度确定性策略梯度(BN-DDPG)算法的智能推演方法, 根据推演规则, 采用马尔可夫决策过程描述推演过程中的决策行为, 以actor-critic体系为基础, 构建智能体训练网络, 其中actor网络使用自定义混合二进制神经网络, 减少计算量; 同时根据经验样本的状态和回报值建立双缓冲池结构, 采用环境相似度优先提取的方法对样本进行采样, 提高训练效率; 最后基于自主研制的仿真推演平台进行实例验证. 结果表明, BN-DDPG算法可简化模型训练过程, 加快模型收敛速度, 显著提高推演决策的准确性.
Abstract:The integration of wargaming and an intelligent algorithm has become a research hotspot in the field of military application. Using deep reinforcement learning (DRL) to realize the intellectualized decision-making process in simulation deduction can significantly reduce the impact of human experience on the decision-making process and improve deduction efficiency and flexibility. Limited by its long training time and high computational cost, the existing decision-making model based on the DRL algorithm cannot meet the requirement of combat tasks for real-time performance. This study introduces an intelligent deduction method based on the lightweight binary neural network-deep deterministic policy gradient (BN-DDPG) algorithm. According to deduction rules, the Markov decision process is used to describe the decision behavior during deduction. Relying on the actor-critic system, an agent training network is constructed, in which the actor network uses a custom hybrid binary neural network to reduce the amount of calculation. At the same time, a double-buffer-pool structure is built according to the status and return value of empirical samples, and sampling is performed by the method of priority extraction of environmental similarity for higher training efficiency. Finally, an example is verified on a self-developed simulation deduction platform. The results show that the BN-DDPG algorithm can simplify the model training process, accelerate the convergence of the model, and significantly improve the accuracy of deduction and decision-making.
文章编号:     中图分类号:    文献标志码:
基金项目:国防基础科研基金(JCKY2020605C003)
引用文本:
李卓远,张德平.基于BN-DDPG轻量级强化学习算法的智能兵棋推演.计算机系统应用,2023,32(4):293-299
LI Zhuo-Yuan,ZHANG De-Ping.Intelligent Wargame Deduction Based on BN-DDPG Lightweight Reinforcement Learning Algorithm.COMPUTER SYSTEMS APPLICATIONS,2023,32(4):293-299