Offline to Online Reinforcement Learning Combining Dynamic Replay Buffer and Time Decaying Constraint
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    In offline-to-online reinforcement learning, though the agent can leverage pre-collected offline data for initial policy learning, the online fine-tuning phase often exhibits instability in the early stages, and the performance improvement after fine-tuning is relatively small. To address this issue, two key designs are proposed: 1) a simulated annealing-based dynamic offline-online replay buffer and 2) simulated annealing-based behavior constraint attenuation. The first design dynamically selects offline data or online interaction experiences during training using the simulated annealing concept to obtain an optimized update strategy, dynamically balancing the stability of online training and fine-tuning performance. The second design introduces a behavior cloning constraint with a cooling mechanism to mitigate the sharp performance drop caused by using online experience updates in the early fine-tuning stage, gradually relaxing the constraint in the later stage to enhance model performance. Experimental results demonstrate that the proposed dynamic replay buffer and time decaying constraints (DRB-TDC) algorithm improves performance by 45%, 65%, and 21% on the HalfCheetah, Hopper, and Walker2d tasks from the MuJoCo benchmark after online fine-tuning, respectively. The average normalization score of all tasks exceeds the best baseline algorithm by 10%.

    Reference
    Related
    Cited by
Get Citation

闫雷鸣,朱永昕,刘健.结合动态缓冲池和时间递减约束的离线到在线强化学习.计算机系统应用,,():1-10

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 22,2024
  • Revised:November 19,2024
  • Online: March 24,2025
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063