Mikhail Smelyanskiy

Citado por

	Total	Desde 2019
Citas	13201	9393
Índice h	45	34
Índice i10	95	73

2000

1000

500

1500

200920102011201220132014201520162017201820192020202120222023202453 93 178 223 273 435 470 468 556 732 1007 1346 1549 1790 1927 1762

Acceso público

Ver todo

13 artículos

1 artículo

disponibles

no disponibles

Basado en requisitos de financiación

Seguir

Mikhail Smelyanskiy

Facebook

Dirección de correo verificada de intel.com - Página principal

Deep learning HPC SW/HW co-design


Título Ordenar por citas Ordenar por año Ordenar por título	Citado por Citado por	Año
On large-batch training for deep learning: Generalization gap and sharp minima NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang arXiv preprint arXiv:1609.04836, 2016	3689	2016
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, ... Proceedings of the 37th annual international symposium on Computer …, 2010	1220	2010
Deep learning recommendation model for personalization and recommendation systems M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ... arXiv preprint arXiv:1906.00091, 2019	791	2019
Applied machine learning at facebook: A datacenter infrastructure perspective K Hazelwood, S Bird, D Brooks, S Chintala, U Diril, D Dzhulgakov, ... 2018 IEEE international symposium on high performance computer architecture …, 2018	762	2018
A study of BFLOAT16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019	364	2019
Efficient sparse matrix-vector multiplication on x86-based many-core processors X Liu, M Smelyanskiy, E Chow, P Dubey Proceedings of the 27th international ACM conference on International …, 2013	342	2013
Glow: Graph lowering compiler techniques for neural networks N Rotem, J Fix, S Abdulrasool, G Catron, S Deng, R Dzhabarov, N Gibson, ... arXiv preprint arXiv:1805.00907, 2018	336	2018
The architectural implications of facebook's dnn-based personalized recommendation U Gupta, CJ Wu, X Wang, M Naumov, B Reagen, D Brooks, B Cottel, ... 2020 IEEE International Symposium on High Performance Computer Architecture …, 2020	331	2020
Recnmp: Accelerating personalized recommendation with near-memory processing L Ke, U Gupta, BY Cho, D Brooks, V Chandra, U Diril, A Firoozshahian, ... 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020	239	2020
Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ... arXiv preprint arXiv:1811.09886, 2018	227	2018
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor A Heinecke, K Vaidyanathan, M Smelyanskiy, A Kobotov, R Dubtsov, ... 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013	219	2013
Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors SJ Pennycook, CJ Hughes, M Smelyanskiy, SA Jarvis 2013 IEEE 27th International symposium on parallel and distributed …, 2013	216	2013
qHiPSTER: The quantum high performance software testing environment M Smelyanskiy, NPD Sawaya, A Aspuru-Guzik arXiv preprint arXiv:1601.07195, 2016	185	2016
Practical optimization for hybrid quantum-classical algorithms GG Guerreschi, M Smelyanskiy arXiv preprint arXiv:1701.01450, 2017	177	2017
Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers A Heinecke, A Breuer, S Rettenberger, M Bader, AA Gabriel, C Pelties, ... SC'14: Proceedings of the International Conference for High Performance …, 2014	176	2014
Anatomy of high-performance many-threaded matrix multiplication TM Smith, R Van De Geijn, M Smelyanskiy, JR Hammond, FG Van Zee 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014	172	2014
Can traditional programming bridge the ninja performance gap for parallel computing applications? N Satish, C Kim, J Chhugani, H Saito, R Krishnaiyer, M Smelyanskiy, ... ACM SIGARCH Computer Architecture News 40 (3), 440-451, 2012	149	2012
Convergence of recognition, mining, and synthesis workloads and its implications YK Chen, J Chhugani, P Dubey, CJ Hughes, D Kim, S Kumar, VW Lee, ... Proceedings of the IEEE 96 (5), 790-807, 2008	149	2008
The BLIS framework: Experiments in portability FG Van Zee, TM Smith, B Marker, TM Low, RAVD Geijn, FD Igual, ... ACM Transactions on Mathematical Software (TOMS) 42 (2), 1-19, 2016	130	2016
On large-batch training for deep learning: Generalization gap and sharp minima. arXiv 2016 NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang arXiv preprint arXiv:1609.04836, 2020	124	2020

El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.

Artículos 1–20

Citas por año

Citas duplicadas

Citas combinadas

Añadir coautoresCoautores

Seguir

Citado por