High accuracy digital image correlation powered by GPU-based parallel computing L Zhang, T Wang, Z Jiang, Q Kemao, Y Liu, Z Liu, L Tang, S Dong Optics and Lasers in Engineering 69, 7-12, 2015 | 112 | 2015 |
Matrix engines for high performance computing: A paragon of performance or grasping at straws? J Domke, E Vatai, A Drozd, P ChenT, Y Oyama, L Zhang, S Salaria, ... 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 38 | 2021 |
Heterogeneous parallel computing accelerated iterative subpixel digital image correlation JW Huang, LQ Zhang, ZY Jiang, SB Dong, W Chen, YP Liu, ZJ Liu, ... Science China Technological Sciences 61, 74-85, 2018 | 30 | 2018 |
A study of single and multi-device synchronization methods in Nvidia GPUs L Zhang, M Wahib, H Zhang, S Matsuoka 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 26 | 2020 |
Scaling distributed deep learning workloads beyond the memory capacity with KARMA M Wahib, H Zhang, TT Nguyen, A Drozd, J Domke, L Zhang, R Takano, ... SC20: International Conference for High Performance Computing, Networking …, 2020 | 24 | 2020 |
Understanding the overheads of launching CUDA kernels L Zhang, M Wahib, S Matsuoka ICPP19, 5-8, 2019 | 15 | 2019 |
PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead L Zhang, C Liu, S Dong Genes 10 (11), 886, 2019 | 13 | 2019 |
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads J Domke, E Vatai, B Gerofi, Y Kodama, M Wahib, A Podobas, S Mittal, ... arXiv preprint arXiv:2204.02235, 2022 | 9 | 2022 |
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 37th International Conference on Supercomputing, 167-179, 2023 | 5 | 2023 |
Persistent Kernels for Iterative Memory-bound GPU Applications L Zhang, M Wahib, P Chen, J Meng, X Wang, S Matsuoka arXiv preprint arXiv:2204.02064, 2022 | 4 | 2022 |
Revisiting Temporal Blocking Stencil Optimizations L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 37th International Conference on Supercomputing, 251-263, 2023 | 3 | 2023 |
Investigating Nvidia GPU Architecture Trends via Microbenchmarks L Zhang, R Barton, P Chen, X Wang, T Endo, S Matsuoka, M Wahib 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER …, 2024 | | 2024 |
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt) L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 34-35, 2023 | | 2023 |
A Study of Synchronization Methods in Modern GPUs L Zhang, M Wahib, H Zhang, S Matsuoka | | 2019 |
Breaking the limitation of GPU Memory for Deep Learning Workloads H Zhang, M Wahib, L Zhang, Y Tsuji, S Mtsuoka | | 2019 |
GPU Accelerated High Accuracy Digital Volume Correlation T Wang, L Zhang, Z Jiang, K Qian International Digital Imaging Correlation Society: Proceedings of the First …, 2017 | | 2017 |