Acesso público

Learning overparameterized neural networks via stochastic gradient descent on structured data

Y Li, Y Liang

Advances in neural information processing systems 31, 2018

Autorizações: US National Science Foundation, US Department of Defense

[PDF] mit.edu

A latent variable model approach to pmi-based word embeddings

S Arora, Y Li, Y Liang, T Ma, A Risteski

Transactions of the Association for Computational Linguistics 4, 385-399, 2016

Autorizações: US National Science Foundation

Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations

Y Li, T Ma, H Zhang

Conference On Learning Theory, 2-47, 2018

Autorizações: US National Science Foundation

Towards explaining the regularization effect of initial large learning rate in training neural networks

Y Li, C Wei, T Ma

Advances in neural information processing systems 32, 2019

Autorizações: US National Science Foundation

[PDF] mit.edu

Linear algebraic structure of word senses, with applications to polysemy

S Arora, Y Li, Y Liang, T Ma, A Risteski

Transactions of the Association for Computational Linguistics 6, 483-495, 2018

Autorizações: US National Science Foundation, US Department of Defense

Gradient descent on neural networks typically occurs at the edge of stability

JM Cohen, S Kaur, Y Li, JZ Kolter, A Talwalkar

arXiv preprint arXiv:2103.00065, 2021

Autorizações: US National Science Foundation, US Department of Defense

LazySVD: Even faster SVD decomposition yet without agonizing pain

Z Allen-Zhu, Y Li

Advances in neural information processing systems 29, 2016

Autorizações: US National Science Foundation

Towards understanding the mixture-of-experts layer in deep learning

Z Chen, Y Deng, Y Wu, Q Gu, Y Li

Advances in neural information processing systems 35, 23049-23062, 2022

Autorizações: US National Science Foundation

Much faster algorithms for matrix scaling

Z Allen-Zhu, Y Li, R Oliveira, A Wigderson

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS …, 2017

Autorizações: US National Science Foundation

The probability flow ode is provably fast

S Chen, S Chewi, H Lee, Y Li, J Lu, A Salim

Advances in Neural Information Processing Systems 36, 68552-68575, 2023

Autorizações: US National Science Foundation

Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing

Z Allen-Zhu, A Garg, Y Li, R Oliveira, A Wigderson

Proceedings of the 50th annual ACM SIGACT symposium on theory of computing …, 2018

Autorizações: US National Science Foundation

Near-optimal method for highly smooth convex optimization

S Bubeck, Q Jiang, YT Lee, Y Li, A Sidford

Conference on Learning Theory, 492-507, 2019

Autorizações: US National Science Foundation

Near Optimal Methods for Minimizing Convex Functions with Lipschitz -th Derivatives

A Gasnikov, P Dvurechensky, E Gorbunov, E Vorontsova, ...

Conference on Learning Theory, 1392-1393, 2019

Autorizações: US National Science Foundation, National Natural Science Foundation of China

How do transformers learn topic structure: Towards a mechanistic understanding

Y Li, Y Li, A Risteski

International Conference on Machine Learning, 19689-19729, 2023

Autorizações: US National Science Foundation

Near-optimal discrete optimization for experimental design: A regret minimization approach

Z Allen-Zhu, Y Li, A Singh, Y Wang

Mathematical Programming 186, 439-478, 2021

Autorizações: US National Science Foundation, US Department of Defense

[PDF] usenix.org

{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections

H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ...

15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021

Autorizações: National Natural Science Foundation of China

Complexity of highly parallel non-smooth convex optimization

S Bubeck, Q Jiang, YT Lee, Y Li, A Sidford

Advances in neural information processing systems 32, 2019

Autorizações: US National Science Foundation

Near-optimal design of experiments via regret minimization

Z Allen-Zhu, Y Li, A Singh, Y Wang

International Conference on Machine Learning, 126-135, 2017

Autorizações: US National Science Foundation

Linear convergence of a frank-wolfe type algorithm over trace-norm balls

Z Allen-Zhu, E Hazan, W Hu, Y Li

Advances in neural information processing systems 30, 2017

Autorizações: US National Science Foundation