McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures NPJ Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International …, 2009 | 3548* | 2009 |
Ten lessons from three generations shaped google’s tpuv4i: Industrial product NP Jouppi, DH Yoon, M Ashcraft, M Gottscho, TB Jablin, G Kurian, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 380 | 2021 |
CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques S Li, K Chen, JH Ahn, JB Brockman, NP Jouppi 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 694-701, 2011 | 337 | 2011 |
Kiln: Closing the performance gap between systems with and without persistence support J Zhao, S Li, DH Yoon, Y Xie, NP Jouppi Proceedings of the 46th Annual IEEE/ACM International Symposium on …, 2013 | 318 | 2013 |
A domain-specific supercomputer for training deep neural networks NP Jouppi, DH Yoon, G Kurian, S Li, N Patil, J Laudon, C Young, ... Communications of the ACM 63 (7), 67-78, 2020 | 316 | 2020 |
Faster cnns with direct sparse convolutions and guided pruning J Park, S Li, W Wen, PTP Tang, H Li, Y Chen, P Dubey arXiv preprint arXiv:1608.01409, 2016 | 290 | 2016 |
Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings N Jouppi, G Kurian, S Li, P Ma, R Nagarajan, L Nai, N Patil, ... Proceedings of the 50th Annual International Symposium on Computer …, 2023 | 264 | 2023 |
CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory K Chen, S Li, N Muralimanohar, JH Ahn, JB Brockman, NP Jouppi | 258* | |
Architecting to achieve a billion requests per second throughput on a single key-value store server platform S Li, H Lim, VW Lee, JH Ahn, A Kalia, M Kaminsky, DG Andersen, ... Proceedings of the 42nd Annual International Symposium on Computer …, 2015 | 171 | 2015 |
McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling JH Ahn, S Li, O Seongil, NP Jouppi 2013 IEEE International Symposium on Performance Analysis of Systems and …, 2013 | 167 | 2013 |
Performing power management in a multicore processor VW Lee, ET Grochowski, D Kim, Y Bai, S Li, NK Mellempudi, ... US Patent 10,234,930, 2019 | 130 | 2019 |
The design process for Google's training chips: TPUv2 and TPUv3 T Norrie, N Patil, DH Yoon, G Kurian, S Li, J Laudon, C Young, N Jouppi, ... IEEE Micro 41 (2), 56-63, 2021 | 117 | 2021 |
Parallelizing word2vec in shared and distributed memory S Ji, N Satish, S Li, PK Dubey IEEE Transactions on Parallel and Distributed Systems 30 (9), 2090-2100, 2019 | 101 | 2019 |
Methods and apparatus to perform error detection and correction S Li, NP Jouppi, N Muralimanohar US Patent 8,788,904, 2014 | 87 | 2014 |
Separate memory controllers to access data in memory DH Yoon, S Li, J Chang, K Chen, P Ranganathan, NP Jouppi US Patent 10,691,344, 2020 | 74 | 2020 |
System implications of memory reliability in exascale computing S Li, K Chen, MY Hsieh, N Muralimanohar, CD Kersey, JB Brockman, ... Proceedings of 2011 International Conference for High Performance Computing …, 2011 | 71 | 2011 |
Enabling sparse winograd convolution by native pruning S Li, J Park, PTP Tang arXiv preprint arXiv:1702.08597, 2017 | 69 | 2017 |
Memory network with memory nodes controlling memory accesses in the memory network S Li, NP Jouppi, P Faraboschi, MR Krause US Patent 10,572,150, 2020 | 55 | 2020 |
Memory network to route memory traffic and I/O traffic DL Barron, P Faraboschi, NP Jouppi, MR Krause, S Li US Patent 9,952,975, 2018 | 48 | 2018 |
System-level integrated server architectures for scale-out datacenters S Li, K Lim, P Faraboschi, J Chang, P Ranganathan, NP Jouppi Proceedings of the 44th Annual IEEE/ACM International Symposium on …, 2011 | 48 | 2011 |