Ian Osband

Cited by

	All	Since 2020
Citations	10670	8820
h-index	28	28
i10-index	37	34

1900

950

475

1425

2015201620172018201920202021202220232024202527 73 243 548 899 1295 1548 1688 1858 1834 583

Co-authors

Benjamin Van RoyStanford UniversityVerified email at stanford.edu
Zheng WenGoogle DeepMindVerified email at google.com
Vikranth DwaracherlaDeepMindVerified email at google.com
Xiuyuan LuGoogle DeepMindVerified email at google.com
Bilal PiotGoogle DeepmindVerified email at google.com
Olivier PietquinEarth Species Project | ex Google DeepMind (On leave - Professor at University of Lille)Verified email at univ-lille.fr
Rémi MunosFAIR, MetaVerified email at inria.fr
Mohammad Gheshlaghi AzarCohereVerified email at cohere.com
Morteza IbrahimiStanford UniversityVerified email at stanford.edu
Daniel RussoColumbia UniversityVerified email at gsb.columbia.edu
Brendan O'DonoghueStanford University, Google DeepMindVerified email at alumni.stanford.edu
Alexander PritzelDeepmindVerified email at google.com
Todd HesterWaymoVerified email at waymo.com
Tom SchaulSenior Staff Scientist, DeepMindVerified email at nyu.edu
Marc LanctotResearch Scientist, Google DeepMindVerified email at google.com

Ian Osband

OpenAI

Verified email at openai.com - Homepage

Reinforcement Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Deep exploration via bootstrapped DQN I Osband, C Blundell, A Pritzel, B Van Roy Advances in neural information processing systems 29, 2016	1630	2016
Deep q-learning from demonstrations T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, D Horgan, ... Proceedings of the AAAI conference on artificial intelligence 32 (1), 2018	1387	2018
A tutorial on thompson sampling DJ Russo, B Van Roy, A Kazerouni, I Osband, Z Wen Foundations and Trends® in Machine Learning 11 (1), 1-96, 2018	1278	2018
Noisy networks for exploration M Fortunato, MG Azar, B Piot, J Menick, I Osband, A Graves, V Mnih, ... arXiv preprint arXiv:1706.10295, 2017	1219	2017
Minimax regret bounds for reinforcement learning MG Azar, I Osband, R Munos International conference on machine learning, 263-272, 2017	907	2017
Randomized prior functions for deep reinforcement learning I Osband, J Aslanides, A Cassirer Advances in neural information processing systems 31, 2018	478	2018
Deep Exploration via Randomized Value Functions I Osband https://searchworks.stanford.edu/view/11891201, 2016	366	2016
Generalization and exploration via randomized value functions I Osband, B Van Roy, Z Wen International Conference on Machine Learning, 2377-2386, 2016	356	2016
Gpt-4o system card A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ... arXiv preprint arXiv:2410.21276, 2024	293	2024
Why is posterior sampling better than optimism for reinforcement learning? I Osband, B Van Roy International conference on machine learning, 2701-2710, 2017	287	2017
The uncertainty bellman equation and exploration B O’Donoghue, I Osband, R Munos, V Mnih International conference on machine learning, 3836-3845, 2018	259	2018
Model-based reinforcement learning and the eluder dimension I Osband, B Van Roy Advances in Neural Information Processing Systems 27, 2014	213	2014
Behaviour suite for reinforcement learning I Osband, Y Doron, M Hessel, J Aslanides, E Sezener, A Saraiva, ... arXiv preprint arXiv:1908.03568, 2019	207	2019
Learning from demonstrations for real world reinforcement learning T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, A Sendonaris, ... arXiv preprint arXiv:1704.03732, 2017	185	2017
Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout I Osband http://bayesiandeeplearning.org/papers/BDL_4.pdf, 0	176*
Deep learning for time series modeling E Busseti, I Osband, S Wong Technical report, Stanford University, 1-5, 2012	146	2012
Epistemic neural networks I Osband, Z Wen, SM Asghari, V Dwaracherla, M Ibrahimi, X Lu, ... Advances in Neural Information Processing Systems 36, 2795-2823, 2023	137	2023
Near-optimal reinforcement learning in factored mdps I Osband, B Van Roy Advances in Neural Information Processing Systems 27, 2014	134	2014
On lower bounds for regret in reinforcement learning I Osband, B Van Roy arXiv preprint arXiv:1608.02732, 2016	124	2016
(More) efficient reinforcement learning via posterior sampling I Osband, D Russo, B Van Roy Advances in Neural Information Processing Systems 26, 2013	121	2013

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors