NeuGraph: Parallel Deep Neural Network Computation on Large Graphs L Ma, Z Yang, Y Miao, J Xue, M Wu, L Zhou, Y Dai 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19), 443-458, 2019 | 282 | 2019 |
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks L Ma, Z Xie, Z Yang, J Xue, Y Miao, W Cui, W Hu, F Yang, L Zhang, ... 14th {USENIX} Symposium on Operating Systems Design and Implementation …, 2020 | 147 | 2020 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ... arXiv preprint arXiv:2402.17764, 2024 | 128 | 2024 |
SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2019 | 93 | 2019 |
Garaph: efficient GPU-accelerated graph processing on a single machine with balanced replication L Ma, Z Yang, H Chen, J Xue, Y Dai 2017 USENIX Annual Technical Conference (USENIX ATC 17), 195-207, 2017 | 83 | 2017 |
Bitnet: Scaling 1-bit transformers for large language models H Wang, S Ma, L Dong, S Huang, H Wang, L Ma, F Yang, R Wang, Y Wu, ... arXiv preprint arXiv:2310.11453, 2023 | 75 | 2023 |
{ROLLER}: Fast and Efficient Tensor Compilation for Deep Learning H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang, J Xue, L Ma, Y Xia, W Cui, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 71 | 2022 |
Architectural Implications of Graph Neural Networks Z Zhang, J Leng, L Ma, Y Miao, C Li, M Guo IEEE Computer Architecture Letters 19 (1), 59-62, 2020 | 58 | 2020 |
Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce X Miao, X Nie, Y Shao, Z Yang, J Jiang, L Ma, B Cui Proceedings of the 2021 International Conference on Management of Data, 2262 …, 2021 | 54 | 2021 |
PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network C Tian, L Ma, Z Yang, Y Dai 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 47 | 2020 |
{SparTA}:{Deep-Learning} Model Sparsity via {Tensor-with-Sparsity-Attribute} N Zheng, B Lin, Q Zhang, L Ma, Y Yang, F Yang, Y Wang, M Yang, L Zhou 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 42 | 2022 |
Towards Efficient Large-Scale Graph Neural Network Computing L Ma, Z Yang, Y Miao, J Xue, M Wu, L Zhou, Y Dai arXiv preprint arXiv:1810.08403, 2018 | 36 | 2018 |
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement X Nie, X Miao, Z Wang, Z Yang, J Xue, L Ma, G Cao, B Cui Proceedings of the ACM on Management of Data 1 (1), 1-19, 2023 | 31 | 2023 |
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 29 | 2021 |
Dense-to-Sparse Gate for Mixture-of-Experts X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 27 | 2021 |
Welder: Scheduling Deep Learning Memory Access via Tile-graph Y Shi, Z Yang, J Xue, L Ma, Y Xia, Z Miao, Y Guo, F Yang, L Zhou 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 20 | 2023 |
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation N Zheng, H Jiang, Q Zhang, Z Han, L Ma, Y Yang, F Yang, C Zhang, L Qiu, ... Proceedings of the 29th Symposium on Operating Systems Principles, 331-347, 2023 | 19 | 2023 |
Optimizing Dynamic Neural Networks with Brainstorm W Cui, Z Han, L Ouyang, Y Wang, N Zheng, L Ma, Y Yang, F Yang, J Xue, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 13 | 2023 |
Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning C Zhang, L Ma, J Xue, Y Shi, Z Miao, F Yang, J Zhai, Z Yang, M Yang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 10 | 2023 |
Accelerating GNN training with locality-aware partial execution T Kim, C Hwang, KS Park, Z Lin, P Cheng, Y Miao, L Ma, Y Xiong Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems, 34-41, 2021 | 10 | 2021 |