Reinforcement learning (RL) has demonstrated the ability to maintain the plasticity of the policy throughout short-term training in aerial robot control. However, these policies have been shown to loss of plasticity when extended to long-term learning in non-stationary environments. For example, the standard proximal policy optimization (PPO) policy is observed to collapse in long-term training settings and lead to significant control performance degradation. To address this problem, this work proposes a cost-aware framework that uses a retrospective cost mechanism (RECOM) to balance rewards and losses in RL training with a nonstationary environment. Using a cost gradient relation between rewards and losses, our framework dynamically updates the learning rate to actively train the control policy in a disturbed wind environment. Our experimental results show that our framework learned a policy for the hovering task without policy collapse in variable wind conditions and has a successful result of 11.72% less dormant units than L2 regularization with PPO.
Anticipated images of aerial robot hovering from different initial positions under variable wind conditions
During the 20M training, the wind disturbance value was changed every 2M timesteps to be [3.0, 2.0, 2.5, 1.5, 2.5] meter per second
Comparison of different reinforcement learning agents in training performance with wind disturbance.
Change of dormant units in the policy network during training under the wind disturbance.
Standard PPO | L2 Regularization with PPO | RECOM with L2 PPO |
---|---|---|
![]() |
![]() |
![]() |
Success Rate (%): 30 | Success Rate (%): 88 | Success Rate (%): 90 |
Simulation results: MSE error metrics comparisonfor three policies.
Our experimental results show that the baseline PPO policy collapses under dynamic wind changes, leading to significant control performance degradation. By integrating the proposed RECOM with the PPO algorithm, the policy remains stable throughout long-term training, and dormant units in the neural network are effectively utilized.
@misc{karasahin2025maintainingplasticityreinforcementlearning,
title={Maintaining Plasticity in Reinforcement Learning: A Cost-Aware Framework for Aerial Robot Control in Non-stationary Environments},
author={Ali Tahir Karasahin and Ziniu Wu and Basaran Bahadir Kocer},
year={2025},
eprint={2503.00282},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2503.00282},
}