Obserwuj
Tianyu He
Tytuł
Cytowane przez
Cytowane przez
Rok
Critical initialization of wide and deep neural networks using partial jacobians: General theory and applications
D Doshi*, T He*, A Gromov
NeurIPS 2023 Spotlight, 2024
16*2024
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
D Singh Kalra, T He, M Barkeshli
ICLR 2025, 2025
6*2025
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
T He, D Doshi, A Das, A Gromov
NeurIPS 2024 Oral, 2024
62024
Grokking Modular Polynomials
D Doshi, T He, A Das, A Gromov
ICLR 2024 workshop BGPT, 2024
22024
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
D Doshi, A Das*, T He*, A Gromov
ICLR 2024, 2024
22024
AutoInit: Automatic Initialization via Jacobian Tuning
T He*, D Doshi*, A Gromov
arXiv preprint arXiv:2206.13568, 2022
12022
(How) Can Transformers Predict Pseudo-Random Numbers?
T Tao*, D Doshi*, DS Kalra*, T He*, M Barkeshli
arXiv preprint arXiv:2502.10390, 2025
2025
Exploring model depth and data complexity through the lens of cellular automata
T He, D Doshi, A Das, A Gromov
NeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning, 2024
2024
Fracton models with crystalline symmetries in two dimensions
T He, A Gromov
APS March Meeting Abstracts 2021, L43. 004, 2021
2021
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–9