基于遗传交叉算子的深度Q网络样本扩充

doi:10.15888/j.cnki.csa.008200

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月14日 13:22 星期一

首页 > 过刊浏览>2021年第30卷第12期 >155-162. DOI:10.15888/j.cnki.csa.008200

PDF HTML阅读 XML下载导出引用引用提醒

基于遗传交叉算子的深度Q网络样本扩充
DOI:
                        10.15888/j.cnki.csa.008200
                    
CSTR:
                        
                    
作者:
                        杨彤杨彤
贵州大学 计算机科学与技术学院, 贵阳 550025
在期刊界中查找
在百度中查找
在本站中查找
秦进秦进
贵州大学 计算机科学与技术学院, 贵阳 550025
在期刊界中查找
在百度中查找
在本站中查找
谢仲涛谢仲涛
贵州大学 计算机科学与技术学院, 贵阳 550025
在期刊界中查找
在百度中查找
在本站中查找
袁琳琳袁琳琳
贵州开放大学 信息工程学院, 贵阳 550025
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61562009); 贵州省科技计划(黔科合基础[2019]1130号)

Samples Expanding of Deep Q Network Based on Genetic Crossover Operator

Author:

YANG Tong
YANG Tong
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
在期刊界中查找
在百度中查找
在本站中查找
QIN Jin
QIN Jin
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
在期刊界中查找
在百度中查找
在本站中查找
XIE Zhong-Tao
XIE Zhong-Tao
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
在期刊界中查找
在百度中查找
在本站中查找
YUAN Lin-Lin
YUAN Lin-Lin
College of Information Engineering, Guizhou Open University, Guiyang 550025, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

区别于传统深度强化学习中通过从经验回放单元逐个选择的状态转移样本进行训练的方式, 针对采用整个序列轨迹作为训练样本的深度Q网络(Deep Q Network, DQN), 提出基于遗传算法的交叉操作扩充序列样本的方法. 序列轨迹是由智能体与环境交互的试错决策过程中产生, 其中会存在相似的关键状态. 以两条序列轨迹中的相似状态作为交叉点, 能产生出当前未出现过的序列轨迹, 从而达到扩充序列样本数量、增大序列样本的多样性的目的, 进而增加智能体的探索能力, 提高样本效率. 与深度Q网络随机采样训练样本和采用序列样本向后更新的算法(Episodic Backward Update, EBU)进行对比, 所提出的方法在Playing Atari 2600视频游戏中能取得更高的奖赏值.

关键词:深度强化学习;经验回放;样本效率;遗传算法

Abstract:

Different from the traditional deep reinforcement learning method of training through transitions selected one by one from the experience replay, for the Deep Q Network (DQN) that uses the entire episode trajectory as the training sample, a method for expanding episode samples is proposed, which is based on genetic algorithm crossover operators. The episode trajectory is generated during the trial-and-error decision-making process of the interaction between the agent and the environment, in which similar key states will be encountered. With the similar state in the two episode trajectories as the intersection point, the episode trajectory that has not appeared till present can be generated to enlarge the number of episode samples and increase their diversity, thereby enhancing the agent’s exploration ability and improving sample efficiency. Compared with DQN that randomly selects samples and uses the Episodic Backward Update (EBU) algorithm, the proposed method can achieve higher rewards in the Playing Atari 2600.

Key words:deep reinforcement learning;experience replay;sample efficiency;genetic algorithm

引用本文

杨彤,秦进,谢仲涛,袁琳琳.基于遗传交叉算子的深度Q网络样本扩充.计算机系统应用,2021,30(12):155-162

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-02-22
最后修改日期:2021-03-19
录用日期:
在线发布日期: 2021-12-10
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码