Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, ... Proceedings of the 37th annual international symposium on Computer …, 2010 | 1220 | 2010 |
Larrabee: a many-core x86 architecture for visual computing L Seiler, D Carmean, E Sprangle, T Forsyth, M Abrash, P Dubey, ... ACM Transactions on Graphics (TOG) 27 (3), 1-15, 2008 | 1216 | 2008 |
Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs C Kim, T Kaldewey, VW Lee, E Sedlar, AD Nguyen, N Satish, J Chhugani, ... Proceedings of the VLDB Endowment 2 (2), 1378-1389, 2009 | 443 | 2009 |
FAST: fast architecture sensitive tree search on modern CPUs and GPUs C Kim, J Chhugani, N Satish, E Sedlar, AD Nguyen, T Kaldewey, VW Lee, ... Proceedings of the 2010 ACM SIGMOD International Conference on Management of …, 2010 | 441 | 2010 |
AltiVec extension to PowerPC accelerates media processing K Diefendorff, PK Dubey, R Hochsprung, H Scale IEEE Micro 20 (2), 85-95, 2000 | 438 | 2000 |
Clearpath: highly parallel collision avoidance for multi-agent simulation SJ Guy, J Chhugani, C Kim, N Satish, M Lin, D Manocha, P Dubey Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer …, 2009 | 436 | 2009 |
Clearpath: highly parallel collision avoidance for multi-agent simulation SJ Guy, J Chhugani, C Kim, N Satish, M Lin, D Manocha, P Dubey Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer …, 2009 | 436 | 2009 |
Graphmat: High performance graph analytics made productive N Sundaram, NR Satish, MMA Patwary, SR Dulloor, SG Vadlamudi, ... arXiv preprint arXiv:1503.07241, 2015 | 408 | 2015 |
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs A Nguyen, N Satish, J Chhugani, C Kim, P Dubey SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010 | 402 | 2010 |
PLEdestrians: A Least-Effort Approach to Crowd Simulation. SJ Guy, J Chhugani, S Curtis, P Dubey, MC Lin, D Manocha Symposium on computer animation, 119-128, 2010 | 377 | 2010 |
A study of BFLOAT16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019 | 364 | 2019 |
Efficient sparse matrix-vector multiplication on x86-based many-core processors X Liu, M Smelyanskiy, E Chow, P Dubey Proceedings of the 27th international ACM conference on International …, 2013 | 342 | 2013 |
How multimedia workloads will change processor design K Diefendorff, PK Dubey Computer 30 (9), 43-45, 1997 | 342 | 1997 |
Efficient implementation of sorting on multi-core SIMD CPU architecture J Chhugani, AD Nguyen, VW Lee, W Macy, M Hagog, YK Chen, A Baransi, ... Proceedings of the VLDB Endowment 1 (2), 1313-1324, 2008 | 339 | 2008 |
Efficient Rijndael encryption implementation with composite field arithmetic A Rudra, PK Dubey, CS Jutla, V Kumar, JR Rao, P Rohatgi Cryptographic Hardware and Embedded Systems—CHES 2001: Third International …, 2001 | 334 | 2001 |
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort N Satish, C Kim, J Chhugani, AD Nguyen, VW Lee, D Kim, P Dubey Proceedings of the 2010 ACM SIGMOD International Conference on Management of …, 2010 | 322 | 2010 |
Faster cnns with direct sparse convolutions and guided pruning J Park, S Li, W Wen, PTP Tang, H Li, Y Chen, P Dubey arXiv preprint arXiv:1608.01409, 2016 | 316* | 2016 |
Scaledeep: A scalable compute architecture for learning and evaluating deep networks S Venkataramani, A Ranjan, S Banerjee, D Das, S Avancha, ... Proceedings of the 44th Annual International Symposium on Computer …, 2017 | 279 | 2017 |
Second life and the new generation of virtual worlds S Kumar, J Chhugani, C Kim, D Kim, A Nguyen, P Dubey, C Bienia, Y Kim Computer 41 (9), 46-53, 2008 | 279 | 2008 |
Platform 2015: Intel processor and platform evolution for the next decade S Borkar, P Dubey, K Kahn, D Kuck, H Mulder, S Pawlowski, J Rattner Technology 1, 30-6, 2005 | 260 | 2005 |