Kernel weaver: Automatically fusing database primitives for efficient gpu computation H Wu, G Diamos, S Cadambi, S Yalamanchili 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 107-118, 2012 | 140 | 2012 |
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications J Lee, H Wu, M Ravichandran, N Clark Proceedings of the 37th annual international symposium on Computer …, 2010 | 128 | 2010 |
SIMD re-convergence at thread frontiers G Diamos, B Ashbaugh, S Maiyuran, A Kerr, H Wu, S Yalamanchili Proceedings of the 44th annual ieee/acm international symposium on …, 2011 | 113 | 2011 |
Red fox: An execution environment for relational query processing on gpus H Wu, G Diamos, T Sheard, M Aref, S Baxter, M Garland, S Yalamanchili Proceedings of Annual IEEE/ACM International Symposium on Code Generation …, 2014 | 108 | 2014 |
Optimizing data warehousing applications for GPUs using kernel fusion/fission H Wu, G Diamos, J Wang, S Cadambi, S Yalamanchili, S Chakradhar 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 92 | 2012 |
Efficient relational algebra algorithms and data structures for GPU GF Diamos, H Wu, A Lele, J Wang Georgia Institute of Technology, 2012 | 44 | 2012 |
Relational algorithms for multi-bulk-synchronous processors G Diamos, H Wu, J Wang, A Lele, S Yalamanchili ACM SIGPLAN Notices 48 (8), 301-302, 2013 | 30 | 2013 |
Characterization and transformation of unstructured control flow in gpu applications H Wu, G Diamos, S Li, S Yalamanchili 1st international workshop on characterizing applications for heterogeneous …, 2011 | 29 | 2011 |
Optimizing data warehousing applications for GPUs using dynamic stream scheduling and dispatch of fused and split kernels H Wu, S Cadambi, ST Chakradhar US Patent 8,990,827, 2015 | 26 | 2015 |
Cutlass V Thakkar, P Ramani, C Cecka, A Shivam, H Lu, E Yan, J Kosaian, ... github, 2023 | 24 | 2023 |
Multipredicate join algorithms for accelerating relational graph processing on GPUs H Wu, D Zinn, M Aref, S Yalamanchili International Workshop on Accelerating Data Management Systems Using Modern …, 2014 | 22 | 2014 |
Accelerating simulation of agent-based models on heterogeneous architectures J Wang, N Rubin, H Wu, S Yalamanchili Proceedings of the 6th Workshop on General Purpose Processor Using Graphics …, 2013 | 19 | 2013 |
Relational learning with GPUs: Accelerating rule coverage CA Martínez-Angeles, H Wu, I Dutra, VS Costa, J Buenabad-Chávez International Journal of Parallel Programming 44 (3), 663-685, 2016 | 18 | 2016 |
Characterization and transformation of unstructured control flow in bulk synchronous GPU applications H Wu, G Diamos, J Wang, S Li, S Yalamanchili International Journal of High Performance Computing Applications 26 (2), 170-185, 2012 | 18 | 2012 |
Cutlass A Kerr, H Wu, M Gupta, D Blasig, P Ramini, D Merrill, A Shivam, ... NVIDIA/cutlass, 2022 | 10 | 2022 |
Satisfying data-intensive queries using GPU clusters J Young, H Wu, S Yalamanchili 2012 SC Companion: High Performance Computing, Networking Storage and …, 2012 | 10 | 2012 |
General-purpose join algorithms for large graph triangle listing on heterogeneous systems D Zinn, H Wu, J Wang, M Aref, S Yalamanchili Proceedings of the 9th Annual Workshop on General Purpose Processing Using …, 2016 | 8 | 2016 |
CUTLASS, January 2023 V Thakkar, P Ramani, C Cecka, A Shivam, H Lu, E Yan, J Kosaian, ... URL https://github. com/NVIDIA/cutlass, 0 | 8 | |
An efficient block motion estimation method on CELL BE X He, Y Zhang, X He, H Wu, Y Zou 2008 International Conference on Audio, Language and Image Processing, 1672-1676, 2008 | 4 | 2008 |
Evt: Accelerating deep learning training with epilogue visitor tree Z Chen, A Kerr, R Cai, J Kosaian, H Wu, Y Ding, Y Xie Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 3 | 2024 |