基于BN-DDPG轻量级强化学习算法的智能兵棋推演

doi:10.15888/j.cnki.csa.009015

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2023年第32卷第4期 >293-299. DOI:10.15888/j.cnki.csa.009015

PDF HTML阅读 XML下载导出引用引用提醒

基于BN-DDPG轻量级强化学习算法的智能兵棋推演
DOI:
                        10.15888/j.cnki.csa.009015
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国防基础科研基金(JCKY2020605C003)

Intelligent Wargame Deduction Based on BN-DDPG Lightweight Reinforcement Learning Algorithm

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

兵棋推演与智能算法融合成为当前军事应用领域的研究热点, 利用深度强化学习技术实现仿真推演中决策过程的智能化, 可显著减少人为经验对决策过程的影响, 提高推演效率和灵活性. 现有基于DRL算法的决策模型, 其训练时间过长, 算力开销过大, 无法满足作战任务的实时性需求. 本文提出一种基于轻量级深度确定性策略梯度(BN-DDPG)算法的智能推演方法, 根据推演规则, 采用马尔可夫决策过程描述推演过程中的决策行为, 以actor-critic体系为基础, 构建智能体训练网络, 其中actor网络使用自定义混合二进制神经网络, 减少计算量; 同时根据经验样本的状态和回报值建立双缓冲池结构, 采用环境相似度优先提取的方法对样本进行采样, 提高训练效率; 最后基于自主研制的仿真推演平台进行实例验证. 结果表明, BN-DDPG算法可简化模型训练过程, 加快模型收敛速度, 显著提高推演决策的准确性.

Abstract:

The integration of wargaming and an intelligent algorithm has become a research hotspot in the field of military application. Using deep reinforcement learning (DRL) to realize the intellectualized decision-making process in simulation deduction can significantly reduce the impact of human experience on the decision-making process and improve deduction efficiency and flexibility. Limited by its long training time and high computational cost, the existing decision-making model based on the DRL algorithm cannot meet the requirement of combat tasks for real-time performance. This study introduces an intelligent deduction method based on the lightweight binary neural network-deep deterministic policy gradient (BN-DDPG) algorithm. According to deduction rules, the Markov decision process is used to describe the decision behavior during deduction. Relying on the actor-critic system, an agent training network is constructed, in which the actor network uses a custom hybrid binary neural network to reduce the amount of calculation. At the same time, a double-buffer-pool structure is built according to the status and return value of empirical samples, and sampling is performed by the method of priority extraction of environmental similarity for higher training efficiency. Finally, an example is verified on a self-developed simulation deduction platform. The results show that the BN-DDPG algorithm can simplify the model training process, accelerate the convergence of the model, and significantly improve the accuracy of deduction and decision-making.

参考文献

相似文献

引证文献

引用本文

李卓远,张德平.基于BN-DDPG轻量级强化学习算法的智能兵棋推演.计算机系统应用,2023,32(4):293-299

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-08-25
最后修改日期:2022-09-27
录用日期:
在线发布日期: 2023-03-17
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码