DeLTA seminar by Yi-Shan Wu

Delta seminar graphic

Participate on Zoom

Speaker

Yi-Shan Wu, SDU

Title

Estimation Bias and Exploration Challenges in Off-Policy RL

Abstract

The performance of off-policy reinforcement learning (RL) critically depends on accurate Q-value estimation. While recent off-policy actor-critic algorithms have shown success in continuous control tasks, much of this progress hinges on using pessimistic Q-value estimates to mitigate the notorious overestimation bias. However, such pessimism can lead to under-exploration, limiting the agent’s ability to discover better policies. On the other hand, optimism can encourage exploration but risks instability and poor convergence if not properly balanced.

In this talk, I will present our approach to navigating this trade-off. Our methods enable independent and learnable control over critic-side pessimism and actor-side exploration, allowing the agent to adaptively balance stability and exploration. We demonstrate the effectiveness of this approach across a range of continuous control benchmarks.

Join the DeLTA community

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
DeLTA online calendar
DeLTA Lab page