DeLTA seminar by Wei-Cheng Lee
Speaker
Wei-Cheng Lee, KAUST
Title
A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak-Ruppert Averaging
Abstract
Temporal-difference (TD) learning is a fundamental algorithmic primitive in reinforcement learning, used for policy evaluation and as a building block for control methods. Despite its simple update rule, finite-time analysis of TD with linear function approximation remains delicate, especially under Markovian sampling, where data are generated along a single trajectory. Existing high-probability analyses often require either explicit projections to keep the iterates bounded, or stepsize choices that depend on an unknown curvature parameter.
In this talk, I will present recent work showing that a single curvature-independent stepsize schedule suffices for unprojected linear TD(0) with Polyak-Ruppert averaging. The same algorithm achieves both a robust curvature-free rate and a fast curvature-dependent rate, without knowing the curvature
in advance. The main technical ingredient is a self-bounding stability argument based on Poisson-equation decompositions for geometrically mixing Markov chains, which allows us to control the TD iterates pathwise without projections. I will start with background on TD learning and the role of robust versus fast rates, and then discuss the main theorem and proof ideas.
This is joint work with Francesco Orabona.
Join the DeLTA community
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
DeLTA online calendar
DeLTA Lab page