DeLTA seminar by Aviv Rosenberg
 
    The talk will be online, but you are welcome to join us physically at SCI-DIKU-UP1-2-0-06 to enjoy it together.
Speaker
 Aviv Rosenberg, Amazon, Tel Aviv
Aviv Rosenberg, Amazon, Tel Aviv
Title
Delay and Cooperation in Reinforcement Learning
Abstract
In this talk I will present our recent results about regret minimization under two challenges that naturally arise in many reinforcement learning (RL) applications: delay and cooperation. Cooperation: In cooperative RL, there are m agents that interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: fresh -- where each agent's trajectory is sampled i.i.d, and non-fresh -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. Delay: The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. We study online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. We present the first algorithms that achieve near-optimal (K+D)^{1/2} regret, where K is the number of episodes and D is the total delay, significantly improving upon the best known regret bound of (K+D)^{2/3}.
-------------------
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar: https://calendar.google.com/calendar/embed?src=c_bm6u2c38ec3ti4lbfjd13c2aqg%40group.calendar.google.com&ctz=Europe%2FCopenhagen
DeLTA Lab page: https://sites.google.com/diku.edu/delta
 
        DeLTA is a research group affiliated with the Department of Computer Science at the University of Copenhagen studying diverse aspects of Machine Learning Theory and its applications, including, but not limited to Reinforcement Learning, Online Learning and Bandits, PAC-Bayesian analysis