Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2089 | 2023 |
Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift Y Ovadia, E Fertig, J Ren, Z Nado, D Sculley, S Nowozin, J Dillon, ... Advances in neural information processing systems 32, 2019 | 1896 | 2019 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1406 | 2023 |
Underspecification presents challenges for credibility in modern machine learning A D'Amour, K Heller, D Moldovan, B Adlam, B Alipanahi, A Beutel, ... Journal of Machine Learning Research 23 (226), 1-61, 2022 | 792 | 2022 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 552 | 2024 |
On empirical comparisons of optimizers for deep learning D Choi arXiv preprint arXiv:1910.05446, 2019 | 387 | 2019 |
Evaluating prediction-time batch normalization for robustness under covariate shift Z Nado, S Padhy, D Sculley, A D'Amour, B Lakshminarayanan, J Snoek arXiv preprint arXiv:2006.10963, 2020 | 236 | 2020 |
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model G Zhang, L Li, Z Nado, J Martens, S Sachdeva, G Dahl, C Shallue, ... Advances in neural information processing systems 32, 2019 | 149 | 2019 |
Plex: Towards reliability using pretrained large model extensions D Tran, J Liu, MW Dusenberry, D Phan, M Collier, J Ren, K Han, Z Wang, ... arXiv preprint arXiv:2207.07411, 2022 | 113 | 2022 |
Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning Z Nado, N Band, M Collier, J Djolonga, MW Dusenberry, S Farquhar, ... arXiv preprint arXiv:2106.04015, 2021 | 111 | 2021 |
A loss curvature perspective on training instabilities of deep learning models J Gilmer, B Ghorbani, A Garg, S Kudugunta, B Neyshabur, D Cardoze, ... International Conference on Learning Representations, 2022 | 71* | 2022 |
Benchmarking bayesian deep learning on diabetic retinopathy detection tasks N Band, TGJ Rudner, Q Feng, A Filos, Z Nado, MW Dusenberry, G Jerfel, ... arXiv preprint arXiv:2211.12717, 2022 | 54 | 2022 |
PaLM 2 Technical Report; 2023 R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 51 | 2023 |
AG: Imperative-style Coding with Graph-based Performance D Moldovan, J Decker, F Wang, A Johnson, B Lee, Z Nado, D Sculley, ... Proceedings of Machine Learning and Systems 1, 389-405, 2019 | 50 | 2019 |
Adaptive gradient methods at the edge of stability JM Cohen, B Ghorbani, S Krishnan, N Agarwal, S Medapati, M Badura, ... arXiv preprint arXiv:2207.14484, 2022 | 49 | 2022 |
Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks S Padhy, Z Nado, J Ren, J Liu, J Snoek, B Lakshminarayanan arXiv preprint arXiv:2007.05134, 2020 | 48 | 2020 |
A simple approach to improve single-model deep uncertainty via distance-awareness JZ Liu, S Padhy, J Ren, Z Lin, Y Wen, G Jerfel, Z Nado, J Snoek, D Tran, ... Journal of Machine Learning Research 24 (42), 1-63, 2023 | 46 | 2023 |
A large batch optimizer reality check: Traditional, generic optimizers suffice across batch sizes Z Nado, JM Gilmer, CJ Shallue, R Anil, GE Dahl arXiv preprint arXiv:2102.06356, 2021 | 42 | 2021 |
Pre-trained Gaussian processes for Bayesian optimization Z Wang, GE Dahl, K Swersky, C Lee, Z Nado, J Gilmer, J Snoek, ... Journal of Machine Learning Research 25 (212), 1-83, 2024 | 31 | 2024 |
Benchmarking neural network training algorithms GE Dahl, F Schneider, Z Nado, N Agarwal, CS Sastry, P Hennig, ... arXiv preprint arXiv:2306.07179, 2023 | 19 | 2023 |