متابعة
Nathan DeBardeleben
Nathan DeBardeleben
Research Scientist, Los Alamos National Laboratory
بريد إلكتروني تم التحقق منه على lanl.gov
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Addressing failures in exascale computing
M Snir, RW Wisniewski, JA Abraham, SV Adve, S Bagchi, P Balaji, J Belak, ...
The International Journal of High Performance Computing Applications 28 (2 …, 2014
5382014
Memory errors in modern systems: The good, the bad, and the ugly
V Sridharan, N DeBardeleben, S Blanchard, KB Ferreira, J Stearley, ...
ACM SIGARCH Computer Architecture News 43 (1), 297-310, 2015
3982015
Feng shui of supercomputer memory: Positional effects in DRAM and SRAM faults
V Sridharan, J Stearley, N DeBardeleben, S Blanchard, S Gurumurthi
Proceedings of the International Conference on High Performance Computing …, 2013
2442013
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
D Tiwari, S Gupta, J Rogers, D Maxwell, P Rech, S Vazhkudai, D Oliveira, ...
2015 IEEE 21st International Symposium on High Performance Computer …, 2015
2052015
On the diversity of cluster workloads and its impact on research results
G Amvrosiadis, JW Park, GR Ganger, GA Gibson, E Baseman, ...
2018 USENIX Annual Technical Conference (USENIX ATC 18), 533-546, 2018
1712018
BinFI an efficient fault injector for safety-critical machine learning systems
Z Chen, G Li, K Pattabiraman, N DeBardeleben
Proceedings of the International Conference for High Performance Computing …, 2019
1272019
GPGPUs: How to Combine High Computational Power with High Reliability
LB Gomez, F Cappello, L Carro, N DeBardeleben, B Fang, S Gurumurthi, ...
892014
F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability
Q Guan, N Debardeleben, S Blanchard, S Fu
Proceedings of the 2014 IEEE 28th International Parallel and Distributed …, 2014
882014
High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development
N DeBardeleben, J Laros, JT Daly, SL Scott, C Engelmann, B Harrod
Whitepaper, Dec, 2009
872009
Tensorfi: A flexible fault injection framework for tensorflow applications
Z Chen, N Narayanan, B Fang, G Li, K Pattabiraman, N DeBardeleben
2020 IEEE 31st International Symposium on Software Reliability Engineering …, 2020
862020
Tensorfi: A configurable fault injector for tensorflow applications
G Li, K Pattabiraman, N DeBardeleben
2018 IEEE International symposium on software reliability engineering …, 2018
772018
Experimental and analytical study of xeon phi reliability
D Oliveira, L Pilla, N DeBardeleben, S Blanchard, H Quinn, I Koren, ...
Proceedings of the International Conference for High Performance Computing …, 2017
622017
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
WM Jones, JT Daly, N DeBardeleben
Proceedings of the 19th ACM International Symposium on High Performance …, 2010
602010
Towards practical algorithm based fault tolerance in dense linear algebra
P Wu, Q Guan, N DeBardeleben, S Blanchard, D Tao, X Liang, J Chen, ...
Proceedings of the 25th ACM International Symposium on High-Performance …, 2016
562016
Application monitoring and checkpointing in hpc: looking towards exascale systems
WM Jones, JT Daly, N DeBardeleben
Proceedings of the 50th annual ACM Southeast Conference, 262-267, 2012
522012
Lessons learned from memory errors observed over the lifetime of cielo
S Levy, KB Ferreira, N DeBardeleben, T Siddiqua, V Sridharan, ...
SC18: International Conference for High Performance Computing, Networking …, 2018
502018
Inter-agency workshop on hpc resilience at extreme scale
J Daly, B Harrod, T Hoang, L Nowell, B Adolf, S Borkar, N DeBardeleben, ...
National Security Agency Advanced Computing Systems, 2012
452012
TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs
J Chen, N Xiong, X Liang, D Tao, S Li, K Ouyang, K Zhao, ...
Proceedings of the ACM International Conference on Supercomputing, 106-116, 2019
432019
GPU behavior on a large HPC cluster
N DeBardeleben, S Blanchard, L Monroe, P Romero, D Grunau, C Idler, ...
Euro-Par 2013: Parallel Processing Workshops: BigDataCloud, DIHC, FedICI …, 2014
422014
Quantifying memory underutilization in hpc systems and using it to improve performance via architecture support
G Panwar, D Zhang, Y Pang, M Dahshan, N DeBardeleben, B Ravindran, ...
Proceedings of the 52nd Annual IEEE/ACM International Symposium on …, 2019
392019
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–20