Mr. biq: Post-training non-uniform quantization based on minimizing the reconstruction error Y Jeon, C Lee, E Cho, Y Ro
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
42 2022 Multi-dimensional parallel training of winograd layer on memory-centric architecture B Hong, Y Ro, J Kim
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture …, 2018
22 2018 {RingLeader}: efficiently Offloading {Intra-Server} Orchestration to {NICs} J Lin, A Cardoza, T Khan, Y Ro, BE Stephens, H Wassel, A Akella
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
12 2023 FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping A Jaiswal, B Hu, L Yin, Y Ro, S Liu, T Chen, A Akella
arXiv preprint arXiv:2404.03865, 2024
4 2024 Post-training weighted quantization of neural networks for language models SJ Kwon, D Lee, Y Jeon, B Kim, BS Park, Y Ro
3 2021 Ghost routing to enable oblivious computation on memory-centric networks Y Ro, S Jin, J Huh, J Kim
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021
2 2021 Q-Rater: Non-convex optimization for post-training uniform quantization B Kim, D Lee, Y Ro, Y Jeon, SJ Kwon, B Park, D Oh
arXiv preprint arXiv:2105.01868, 2021
2 2021 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design R Cai, Y Ro, GW Kim, P Wang, BE Bejnordi, A Akella, Z Wang
arXiv preprint arXiv:2410.19123, 2024
2024 Electronic device and control method therefor B Kim, D Lee, K Sejung, RO Yeonju, P Baeseong, J Yongkweon
US Patent App. 18/131,164, 2023
2023 Lowering the pre-training tax for gradient-based subset training: a lightweight distributed pre-training toolkit Y Ro, Z Wang, V Chidambaram, A Akella
International Conference on Machine Learning, 29130-29142, 2023
2023 Dataset Efficient Training with Model Ensembling Y Ro, C Xu, A Ciborowska, S Bhattacharya, F Li, M Foltin
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
2023 Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity. B Park, SJ Kwon, D Lee, D Oh, B Kim, Y Jeon, Y Ro
CoRR, 2021
2021 : Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-DesignR Cai, Y Ro, GW Kim, P Wang, BE Bejnordi, A Akella, Z Wang
The Thirty-eighth Annual Conference on Neural Information Processing Systems, 0
Optimizing Transformer Inference with Selective Distillation: Layerwise Conversion to Linear Attention Y Ro, Z Zhang, V Chidambaram, A Akella