Samples Expanding of Deep Q Network Based on Genetic Crossover Operator

doi:10.15888/j.cnki.csa.008200

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-25- 19

Home > Archive>Volume 30, Issue 12, 2021 >155-162. DOI:10.15888/j.cnki.csa.008200

PDF HTML XML Export Cite reminder

Samples Expanding of Deep Q Network Based on Genetic Crossover Operator
DOI:
                        10.15888/j.cnki.csa.008200
                    
CSTR:
                        [cstr]
                    
Author:
                        YANG TongYANG Tong
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
QIN JinQIN Jin
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
XIE Zhong-TaoXIE Zhong-Tao
College of Computer Science & Technology, Guizhou University, Guiyang 550025, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
YUAN Lin-LinYUAN Lin-Lin
College of Information Engineering, Guizhou Open University, Guiyang 550025, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Different from the traditional deep reinforcement learning method of training through transitions selected one by one from the experience replay, for the Deep Q Network (DQN) that uses the entire episode trajectory as the training sample, a method for expanding episode samples is proposed, which is based on genetic algorithm crossover operators. The episode trajectory is generated during the trial-and-error decision-making process of the interaction between the agent and the environment, in which similar key states will be encountered. With the similar state in the two episode trajectories as the intersection point, the episode trajectory that has not appeared till present can be generated to enlarge the number of episode samples and increase their diversity, thereby enhancing the agent’s exploration ability and improving sample efficiency. Compared with DQN that randomly selects samples and uses the Episodic Backward Update (EBU) algorithm, the proposed method can achieve higher rewards in the Playing Atari 2600.

Key words:deep reinforcement learning;experience replay;sample efficiency;genetic algorithm

Get Citation

杨彤,秦进,谢仲涛,袁琳琳.基于遗传交叉算子的深度Q网络样本扩充.计算机系统应用,2021,30(12):155-162

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:February 22,2021
Revised:March 19,2021
Adopted:
Online: December 10,2021
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063