本文已被:浏览 773次 下载 1559次
Received:February 12, 2022 Revised:March 23, 2022
Received:February 12, 2022 Revised:March 23, 2022
中文摘要: 虽然深度强化学习能够解决很多复杂的控制问题, 但是需要付出的代价是必须和环境进行大量的交互, 这是深度强化学习所面临的一大挑战. 造成这一问题的原因之一是仅依靠值函数损失难以让智能体从高维的复杂输入中提取有效特征. 导致智能体对所处状态理解不足, 从而不能正确给状态分配价值. 因此, 为了让智能体认识所处环境, 提高强化学习样本效率, 本文提出一种结合向前状态预测与隐空间约束的表示学习方法(regularized predictive representation learning, RPRL). 帮助智能体从高维视觉输入中学习并提取状态特征, 以此来提高强化学习样本效率. 该方法用前向的状态转移损失作为辅助损失, 使智能体学习到的特征包含环境转移的相关动态信息. 同时在向前预测的基础上添加正则化项对隐空间的状态表示进行约束, 进一步帮助智能体学习到高维度输入的平滑、规则表示. 该方法在DeepMind Control (DMControl)环境中与其他的基于模型的方法以及加入了表示学习的无模型方法进行比较, 都获得了更好的性能.
Abstract:Although deep reinforcement learning can solve many complex control problems, it needs to pay the cost of a large number of interactions with the environment, which is a major challenge for deep reinforcement learning. One of the reasons for this problem is that it is difficult for an agent to extract effective features from a high-dimensional complex input only by relying on the loss of value function. As a result, the agent has an insufficient understanding of the state and cannot correctly assign value to the state. Therefore, this study proposes a regularized predictive representation learning (RPRL) method combining forward state prediction and latent space constraint to make agents know the environment and improve the sample efficiency of reinforcement learning. The method helps agents to learn and extract state features from high-dimensional visual observations to improve the sample efficiency of reinforcement learning. The forward state transfer loss is used as the auxiliary loss so that the features learned by agents contain dynamic information related to environmental transition. At the same time, the state representation of latent space is regularized on the basis of forward prediction, which further helps the agent to learn the smooth and regular representation of the high-dimensional input. In DeepMind Control (DMControl) environment, the proposed method achieves better performance than other model-based methods and model-free methods with representation learning.
keywords: reinforcement learning representation method state transition latent space constraint continuous control high dimensional input
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61562009); 贵州省科学技术基金(黔科合基础[2020]1Y275); 贵州省科技计划(黔科合基础[2019]1130号)
引用文本:
项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法.计算机系统应用,2022,31(11):148-156
XIANG Yu,QIN Jin,YUAN Lin-Lin.Reinforcement Learning Representation Algorithm Combining Forward State Prediction and Latent Space Regularization.COMPUTER SYSTEMS APPLICATIONS,2022,31(11):148-156
项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法.计算机系统应用,2022,31(11):148-156
XIANG Yu,QIN Jin,YUAN Lin-Lin.Reinforcement Learning Representation Algorithm Combining Forward State Prediction and Latent Space Regularization.COMPUTER SYSTEMS APPLICATIONS,2022,31(11):148-156