A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. / Rouyer, Chloé; van der Hoeven, Dirk; Cesa-Bianchi, Nicolò; Seldin, Yevgeny.

Advances in Neural Information Processing Systems 35 (NeurIPS 2022). NeurIPS Proceedings, 2022. p. 35035-35048.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Rouyer, C, van der Hoeven, D, Cesa-Bianchi, N & Seldin, Y 2022, A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. in Advances in Neural Information Processing Systems 35 (NeurIPS 2022). NeurIPS Proceedings, pp. 35035-35048, 36th Conference on Neural Information Processing Systems (NeurIPS 2022)., New Orleans/ Virtual, United States, 28/11/2022. <https://proceedings.neurips.cc/paper_files/paper/2022/file/e36da7acd188c6655792270b38830124-Supplemental-Conference.pdf>

APA

Rouyer, C., van der Hoeven, D., Cesa-Bianchi, N., & Seldin, Y. (2022). A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (pp. 35035-35048). NeurIPS Proceedings. https://proceedings.neurips.cc/paper_files/paper/2022/file/e36da7acd188c6655792270b38830124-Supplemental-Conference.pdf

Vancouver

Rouyer C, van der Hoeven D, Cesa-Bianchi N, Seldin Y. A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022). NeurIPS Proceedings. 2022. p. 35035-35048

Author

Rouyer, Chloé ; van der Hoeven, Dirk ; Cesa-Bianchi, Nicolò ; Seldin, Yevgeny. / A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). NeurIPS Proceedings, 2022. pp. 35035-35048

Bibtex

@inproceedings{a9c94c2048a240fd81dc3fa87bf21c4a,
title = "A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs",
abstract = "We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.",
author = "Chlo{\'e} Rouyer and {van der Hoeven}, Dirk and Nicol{\`o} Cesa-Bianchi and Yevgeny Seldin",
year = "2022",
language = "English",
pages = "35035--35048",
booktitle = "Advances in Neural Information Processing Systems 35 (NeurIPS 2022)",
publisher = "NeurIPS Proceedings",
note = "null ; Conference date: 28-11-2022 Through 09-12-2022",

}

RIS

TY - GEN

T1 - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

AU - Rouyer, Chloé

AU - van der Hoeven, Dirk

AU - Cesa-Bianchi, Nicolò

AU - Seldin, Yevgeny

PY - 2022

Y1 - 2022

N2 - We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

AB - We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

M3 - Article in proceedings

SP - 35035

EP - 35048

BT - Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

PB - NeurIPS Proceedings

Y2 - 28 November 2022 through 9 December 2022

ER -

ID: 383100739