DeLTA seminar by Yi-Shan Wu

Speaker

Title

Optimistic Posterior Sampling for Non-Stationary Linear Contextual Bandits

Abstract

We study the non-stationary linear contextual bandit problem under a drifting reward model, focusing on weighted strategies for adapting to non-stationarity and posterior sampling approaches for exploration. While weighting provides a natural mechanism for adapting to changes over time, existing theoretical analyses often lead to suboptimal regret guarantees. Moreover, although posterior sampling methods often outperform UCB-based approaches empirically, they generally suffer from worse theoretical guarantees. Motivated by recent findings that standard Thompson Sampling can under-explore due to overly concentrated posteriors, we explore optimistic variants of the D-LinTS algorithm (Kim & Tewari, 2021) designed to encourage sufficient exploration and improve regret bounds. This is ongoing work—our current proof technique has run into challenges—but we aim to share key ideas, partial results, and open questions, in hopes of sparking discussion and generating feedback or potential solutions.

Join the DeLTA community

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
DeLTA online calendar
DeLTA Lab page

Department of Computer Science

DeLTA seminar by Yi-Shan Wu

Speaker

Title

Abstract

Join the DeLTA community

Details