DeLTA seminar by Aviv Rosenberg

Delta logo

The talk will be online, but you are welcome to join us physically at SCI-DIKU-UP1-2-0-06 to enjoy it together.

Speaker

Aviv Rosenberg portraitAviv Rosenberg, Amazon, Tel Aviv

Title

Delay and Cooperation in Reinforcement Learning

Abstract

In this talk I will present our recent results about regret minimization under two challenges that naturally arise in many reinforcement learning (RL) applications: delay and cooperation. Cooperation: In cooperative RL, there are m agents that interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: fresh -- where each agent's trajectory is sampled i.i.d, and non-fresh -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. Delay: The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. We study online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. We present the first algorithms that achieve near-optimal (K+D)^{1/2} regret, where K is the number of episodes and D is the total delay, significantly improving upon the best known regret bound of (K+D)^{2/3}.

-------------------
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar: https://calendar.google.com/calendar/embed?src=c_bm6u2c38ec3ti4lbfjd13c2aqg%40group.calendar.google.com&ctz=Europe%2FCopenhagen
DeLTA Lab page: https://sites.google.com/diku.edu/delta