A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. / Rouyer, Chloé; van der Hoeven, Dirk; Cesa-Bianchi, Nicolò; Seldin, Yevgeny.
Advances in Neural Information Processing Systems 35 (NeurIPS 2022). NeurIPS Proceedings, 2022. p. 35035-35048.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
AU - Rouyer, Chloé
AU - van der Hoeven, Dirk
AU - Cesa-Bianchi, Nicolò
AU - Seldin, Yevgeny
PY - 2022
Y1 - 2022
N2 - We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.
AB - We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.
M3 - Article in proceedings
SP - 35035
EP - 35048
BT - Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
PB - NeurIPS Proceedings
Y2 - 28 November 2022 through 9 December 2022
ER -
ID: 383100739