Model parallelism optimization for distributed inference via decoupled CNN structure J Du, X Zhu, M Shen, Y Du, Y Lu, N Xiao, X Liao IEEE Transactions on Parallel and Distributed Systems 32 (7), 1665-1676, 2020 | 32 | 2020 |
A distributed in-situ CNN inference system for IoT applications J Du, M Shen, Y Du 2020 IEEE 38th International Conference on Computer Design (ICCD), 279-287, 2020 | 16 | 2020 |
Optimizing small channel 3D convolution on GPU with tensor core J Jiang, D Huang, J Du, Y Lu, X Liao Parallel Computing 113, 102954, 2022 | 10 | 2022 |
Galaxy: A resource-efficient collaborative edge ai system for in-situ transformer inference S Ye, J Du, L Zeng, W Ou, X Chu, Y Lu, X Chen IEEE INFOCOM 2024-IEEE Conference on Computer Communications, 1001-1010, 2024 | 8 | 2024 |
Handling heavy-tailed input of transformer inference on GPUs J Du, J Jiang, Y You, D Huang, Y Lu Proceedings of the 36th ACM International Conference on Supercomputing, 1-11, 2022 | 8 | 2022 |
Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference J Du, J Wei, J Jiang, S Cheng, D Huang, Z Chen, Y Lu Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and …, 2024 | 7 | 2024 |
Improving computation and memory efficiency for real-world transformer inference on gpus J Du, J Jiang, J Zheng, H Zhang, D Huang, Y Lu ACM Transactions on Architecture and Code Optimization 20 (4), 1-22, 2023 | 7 | 2023 |
Full-stack optimizing transformer inference on ARM many-core CPU J Jiang, J Du, D Huang, Z Chen, Y Lu, X Liao IEEE Transactions on Parallel and Distributed Systems 34 (7), 2221-2235, 2023 | 7 | 2023 |
Characterizing and optimizing transformer inference on arm many-core processor J Jiang, J Du, D Huang, D Li, J Zheng, Y Lu Proceedings of the 51st International Conference on Parallel Processing, 1-11, 2022 | 6 | 2022 |
P-sobi: A parallel implementation for second order blind identification algorithm H Li, J Du, Y Du, Z Chen, N Xiao 2019 IEEE 21st International Conference on High Performance Computing and …, 2019 | 5 | 2019 |
ATP: Adaptive Tensor Parallelism for Foundation Models S Cheng, Z Liu, J Du, Y You arXiv preprint arXiv:2301.08658, 2023 | 4 | 2023 |
EnergonAI: An inference system for 10-100 billion parameter transformer models J Du, Z Liu, J Fang, S Li, Y Li, Y Lu, Y You arXiv preprint arXiv:2209.02341, 2022 | 4 | 2022 |
Hierarchical model parallelism for optimizing inference on many-core processor via decoupled 3D-CNN structure J Jiang, Z Huang, D Huang, J Du, L Chen, Z Chen, Y Lu ACM Transactions on Architecture and Code Optimization 20 (3), 1-21, 2023 | 2 | 2023 |
CosNAS: Enhancing estimation on cosmological parameters via neural architecture search Y Wen, W Yu, D Li, J Du, D Huang, N Xiao New Astronomy 99, 101955, 2023 | 2 | 2023 |
Enhancing Distributed In-Situ CNN Inference in the Internet of Things J Du, Y Du, D Huang, Y Lu, X Liao IEEE Internet of Things Journal 9 (17), 15511-15524, 2022 | 2 | 2022 |
Optimizing massively parallel sparse matrix computing on ARM many-core processor J Zheng, J Jiang, J Du, D Huang, Y Lu Parallel Computing 117, 103035, 2023 | 1 | 2023 |
Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning S Cheng, S Lin, L Diao, H Wu, S Wang, C Si, Z Liu, X Zhao, J Du, W Lin, ... Proceedings of the 30th ACM International Conference on Architectural …, 2025 | | 2025 |
ORFA: Exploring WebAssembly as a Turing Complete Query Language for Web APIs Y Gu, C Chen, J Du, X Zhang, X Zhang THE WEB CONFERENCE 2025, 2025 | | 2025 |
Co-designing Transformer Architectures for Distributed Inference with Low Communication J Du, Y Wei, S Ye, J Jiang, X Chen, D Huang, Y Lu IEEE Transactions on Parallel and Distributed Systems, 2024 | | 2024 |
APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes Y Wei, J Du, J Jiang, X Shi, X Zhang, D Huang, N Xiao, Y Lu SC24: International Conference for High Performance Computing, Networking …, 2024 | | 2024 |