Accelerating reduction and scan using tensor core units A Dakkak, C Li, J Xiong, I Gelado, W Hwu Proceedings of the ACM International Conference on Supercomputing, 46-57, 2019 | 105 | 2019 |
Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects C Pearson, A Dakkak, S Hashash, C Li, IH Chung, J Xiong, WM Hwu Proceedings of the 2019 ACM/SPEC International Conference on Performance …, 2019 | 42 | 2019 |
Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service A Dakkak, C Li, SG De Gonzalo, J Xiong, W Hwu 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 372-382, 2019 | 31 | 2019 |
XSP: Across-stack profiling and analysis of machine learning models on GPUs C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 29 | 2020 |
Accelerating fourier and number theoretic transforms using tensor cores and warp shuffles S Durrani, MS Chughtai, M Hidayetoglu, R Tahir, A Dakkak, ... 2021 30th International conference on parallel architectures and compilation …, 2021 | 25 | 2021 |
Webgpu: A scalable online development platform for gpu programming courses A Dakkak, C Pearson, W Hwu 2016 IEEE International Parallel and Distributed Processing Symposium …, 2016 | 22 | 2016 |
Recovering missing depth information from Microsoft’s Kinect A Dakkak, A Husain Proc. Embedded Vis. Alliance, 1-9, 2012 | 18 | 2012 |
Enhancing the usability and utilization of accelerated architectures via docker N Haydel, S Gesing, I Taylor, G Madey, A Dakkak, SG De Gonzalo, ... 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing …, 2015 | 16 | 2015 |
Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing C Rodrigues, T Jablin, A Dakkak, WM Hwu ACM SIGPLAN Notices 49 (8), 247-258, 2014 | 16 | 2014 |
The design and implementation of a scalable deep learning benchmarking platform C Li, A Dakkak, J Xiong, W Hwu 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), 414-425, 2020 | 13 | 2020 |
Benanza: Automatic μBenchmark Generation to Compute" Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs C Li, A Dakkak, J Xiong, W Hwu 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 13 | 2020 |
Tangram: a high-level language for performance portable code synthesis LW Chang, A Dakkak, CI Rodrigues, W Hwu Programmability Issues for Heterogeneous Multicores, 2015 | 12 | 2015 |
FFT blitz: the tensor cores strike back S Durrani, MS Chughtai, A Dakkak, W Hwu, L Rauchwerger Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 10 | 2021 |
Transitioning HPC software to exascale heterogeneous computing WM Hwu, LW Chang, HS Kim, A Dakkak, I El Hajj 2015 Computational Electromagnetics International Workshop (CEM), 1-2, 2015 | 10 | 2015 |
MLModelScope: Evaluate and measure ML models within AI pipelines A Dakkak, C Li, A Srivastava, J Xiong, WM Hwu arXiv preprint arXiv:1811.09737, 2018 | 9 | 2018 |
A programming system for future proofing performance critical libraries LW Chang, I El Hajj, HS Kim, J Gómez-Luna, A Dakkak, W Hwu ACM SIGPLAN Notices 51 (8), 1-2, 2016 | 8 | 2016 |
Across-stack profiling and characterization of machine learning models on GPUs C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu arXiv preprint arXiv:1908.06869, 2019 | 7 | 2019 |
Frustrated with replicating claims of a shared model? a solution A Dakkak, C Li, J Xiong, WM Hwu arXiv preprint arXiv:1811.09737, 2018 | 7 | 2018 |
Thoughts on massively-parallel heterogeneous computing for solving large problems W Hwu, M Hidayetoglu, WC Chew, C Pearson, S Garcia, S Huang, ... 2017 Computing and Electromagnetics International Workshop (CEM), 67-68, 2017 | 6 | 2017 |
Mlmodelscope: A distributed platform for model evaluation and benchmarking at scale A Dakkak, C Li, J Xiong, W Hwu arXiv preprint arXiv:2002.08295, 2020 | 5 | 2020 |