Overlap communication with dependent computation via decomposition in large deep learning models S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ... Proceedings of the 28th ACM International Conference on Architectural …, 2022 | 66 | 2022 |
Effective sampling-driven performance tools for GPU-accelerated supercomputers M Chabbi, K Murthy, M Fagan, J Mellor-Crummey Proceedings of the International Conference on High Performance Computing …, 2013 | 37 | 2013 |
On the efficacy of GPU-integrated MPI for scientific applications AM Aji, LS Panwar, F Ji, M Chabbi, K Murthy, P Balaji, KR Bisset, J Dinan, ... Proceedings of the 22nd international symposium on High-performance parallel …, 2013 | 34 | 2013 |
A flexible approach to autotuning multi-pass machine learning compilers PM Phothilimthana, A Sabne, N Sarda, KS Murthy, Y Zhou, ... 2021 30th International Conference on Parallel Architectures and Compilation …, 2021 | 33 | 2021 |
MPI-ACC: accelerator-aware MPI for scientific applications AM Aji, LS Panwar, F Ji, K Murthy, M Chabbi, P Balaji, KR Bisset, J Dinan, ... IEEE transactions on parallel and distributed systems 27 (5), 1401-1414, 2015 | 28 | 2015 |
Automatically arranging objects in a graphical program block diagram A Kodaganur, AJ Singri, A Prasad, KS Murthy, C Smith, B Dev US Patent 8,479,218, 2013 | 26 | 2013 |
Optimized distributed work-stealing V Kumar, K Murthy, V Sarkar, Y Zheng 2016 6th Workshop on Irregular Applications: Architecture and Algorithms …, 2016 | 20 | 2016 |
Managing asynchronous operations in Coarray Fortran 2.0 C Yang, K Murthy, J Mellor-Crummey 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013 | 12 | 2013 |
Dismarc: A distributed map reduce framework on cuda A Mooley, K Murthy, H Singh University of Texas, Austin, Tech. Rep, 2009 | 12 | 2009 |
Early evaluation of scalable fabric interface for PGAS programming models M Luo, K Seager, KS Murthy, CJ Archer, S Sur, S Hefty Proceedings of the 8th International Conference on Partitioned Global …, 2014 | 9 | 2014 |
Communication avoiding algorithms: Analysis and code generation for parallel systems K Murthy, J Mellor-Crummey 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 8 | 2015 |
Design and verification of distributed phasers K Murthy, SR Paul, KS Meel, T Cogumbreiro, J Mellor-Crummey Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016 | 7 | 2016 |
A compiler transformation to overlap communication with dependent computation K Murthy, J Mellor-Crummey 2015 9th International Conference on Partitioned Global Address Space …, 2015 | 6 | 2015 |
Hpctoolkit a tool for performance analysis on heterogeneous supercomputers M Chabbi, K Murthy, M Fagan, J Mellor-Crummey GTC 2013 San Jose, 2013 | 5 | 2013 |
Class II submission to the HPC Challenge award competition Coarray Fortran 2.0 J Mellor-Crummey, L Adhianto, G Jin, M Krentel, K Murthy, W Scherer, ... Submission to the, 2011 | 4 | 2011 |
TensorRight: Automated Verification of Tensor Graph Rewrites J Arora, S Lu, D Jain, T Xu, F Houshmand, PM Phothilimthana, M Lesani, ... Proceedings of the ACM on Programming Languages 9 (POPL), 832-863, 2025 | | 2025 |
Code Generation for Extreme Scale Parallel Systems K Srinivasa Murthy | | 2017 |
Distributed Phasers SR Paul, K Murthy, KS Meel, J Mellor-Crummey arXiv preprint arXiv:1512.07305, 2015 | | 2015 |
Design and Implementation of Quality Optimized Image Scaling Processor using VLSI Technology M NIKHITHA, KSR MURTHY | | 2015 |
Development of an Implicit, Charge and Energy Conserving 2D Electromagnetic PIC Code on Advanced Architectures J Payne, W Taitano, D Knoll, C Liebs, K Murthy, N Feltman, Y Wang, ... APS Division of Plasma Physics Meeting Abstracts 54, UP8. 023, 2012 | | 2012 |