Critical initialization of wide and deep neural networks using partial jacobians: General theory and applications D Doshi*, T He*, A Gromov NeurIPS 2023 Spotlight, 2024 | 16* | 2024 |
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos D Singh Kalra, T He, M Barkeshli ICLR 2025, 2025 | 6* | 2025 |
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks T He, D Doshi, A Das, A Gromov NeurIPS 2024 Oral, 2024 | 6 | 2024 |
Grokking Modular Polynomials D Doshi, T He, A Das, A Gromov ICLR 2024 workshop BGPT, 2024 | 2 | 2024 |
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets D Doshi, A Das*, T He*, A Gromov ICLR 2024, 2024 | 2 | 2024 |
AutoInit: Automatic Initialization via Jacobian Tuning T He*, D Doshi*, A Gromov arXiv preprint arXiv:2206.13568, 2022 | 1 | 2022 |
(How) Can Transformers Predict Pseudo-Random Numbers? T Tao*, D Doshi*, DS Kalra*, T He*, M Barkeshli arXiv preprint arXiv:2502.10390, 2025 | | 2025 |
Exploring model depth and data complexity through the lens of cellular automata T He, D Doshi, A Das, A Gromov NeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning, 2024 | | 2024 |
Fracton models with crystalline symmetries in two dimensions T He, A Gromov APS March Meeting Abstracts 2021, L43. 004, 2021 | | 2021 |