DeLTA seminar by Yi-Shan Wu

Speaker
Yi-Shan Wu, SDU
Title
Estimation Bias and Exploration Challenges in Off-Policy RL
Abstract
The performance of off-policy reinforcement learning (RL) critically depends on accurate Q-value estimation. While recent off-policy actor-critic algorithms have shown success in continuous control tasks, much of this progress hinges on using pessimistic Q-value estimates to mitigate the notorious overestimation bias. However, such pessimism can lead to under-exploration, limiting the agent’s ability to discover better policies. On the other hand, optimism can encourage exploration but risks instability and poor convergence if not properly balanced.
In this talk, I will present our approach to navigating this trade-off. Our methods enable independent and learnable control over critic-side pessimism and actor-side exploration, allowing the agent to adaptively balance stability and exploration. We demonstrate the effectiveness of this approach across a range of continuous control benchmarks.
Join the DeLTA community
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
DeLTA online calendar
DeLTA Lab page