‪Xiaoxia (Shirley) Wu 吴晓霞‬


Titre Trier par citations Trier par année Trier par titre	Citée par Citée par	Année
Phi-3 technical report: A highly capable language model locally on your phone M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... arXiv preprint arXiv:2404.14219, 2024	473	2024
Adagrad stepsizes: Sharp convergence over nonconvex landscapes R Ward, X Wu, L Bottou Journal of Machine Learning Research 21 (219), 1-30, 2020	351	2020
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers Z Yao, R Yazdani Aminabadi, M Zhang, X Wu, C Li, Y He Advances in Neural Information Processing Systems 35, 27168-27183, 2022	325	2022
When do curricula work? X Wu, E Dyer, B Neyshabur arXiv preprint arXiv:2012.03107, 2020	136	2020
Wngrad: Learn the learning rate in gradient descent X Wu, R Ward, L Bottou arXiv preprint arXiv:1803.02865, 2018	93	2018
Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization R Ward, X Wu, L Bottou arXiv preprint arXiv:1806.01811, 2018	89	2018
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation Z Yao, X Wu, C Li, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023	73*	2023
Global convergence of adaptive gradient methods for an over-parameterized neural network X Wu, SS Du, R Ward arXiv preprint arXiv:1902.07111, 2019	69	2019
Hierarchical learning for generation with long source sequences T Rohde, X Wu, Y Liu arXiv preprint arXiv:2104.07545, 2021	61	2021
Linear convergence of adaptive stochastic gradient descent Y Xie, X Wu, R Ward International conference on artificial intelligence and statistics, 1475-1485, 2020	56	2020
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	53	2023
Choosing the sample with lowest loss makes sgd robust V Shah, X Wu, S Sanghavi International Conference on Artificial Intelligence and Statistics, 2120-2130, 2020	49	2020
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He arXiv preprint arXiv:2301.12017, 2023	37*	2023
Zero++: Extremely efficient collective communication for giant model training G Wang, H Qin, SA Jacobs, C Holmes, S Rajbhandari, O Ruwase, F Yan, ... arXiv preprint arXiv:2306.10209, 2023	35	2023
Value-at-Risk estimation with stochastic interest rate models for option-bond portfolios X Wang, D Xie, J Jiang, X Wu, J He Finance Research Letters 21, 10-20, 2017	31	2017
Implicit regularization and convergence for weight normalization X Wu, E Dobriban, T Ren, S Wu, Z Li, S Gunasekar, R Ward, Q Liu Advances in Neural Information Processing Systems 33, 2835-2847, 2020	30*	2020
Xtc: Extreme compression for pre-trained transformers made simple and efficient X Wu, Z Yao, M Zhang, C Li, Y He Advances in Neural Information Processing Systems 35, 3217-3231, 2022	26	2022
Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats X Wu, Z Yao, Y He arXiv preprint arXiv:2307.09782, 2023	21	2023
Renaissance: A survey into ai text-to-image generation in the era of large model F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang, Z Yao, X Wu, C Holmes, ... arXiv preprint arXiv:2309.00810, 2023	19	2023
Deepspeed data efficiency: Improving deep learning model quality and training efficiency via efficient data sampling and routing C Li, Z Yao, X Wu, M Zhang, C Holmes, C Li, Y He Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 18490 …, 2024	16	2024

Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.

Articles 1–20

Nombre de citations par an

Citations en double

Citations fusionnées

Ajouter les coauteursCoauteurs

Suivre

Citée par