System log pre-processing to improve failure prediction Z Zheng, Z Lan, BH Park, A Geist 2009 IEEE/IFIP International Conference on Dependable Systems & Networks …, 2009 | 150 | 2009 |
Toward automated anomaly identification in large-scale systems Z Lan, Z Zheng, Y Li IEEE Transactions on Parallel and Distributed Systems 21 (2), 174-187, 2009 | 148 | 2009 |
Fault-aware, utility-based job scheduling on blue, gene/p systems W Tang, Z Lan, N Desai, D Buettner 2009 IEEE International Conference on Cluster Computing and Workshops, 1-10, 2009 | 123 | 2009 |
Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems X Yang, Z Zhou, S Wallace, Z Lan, W Tang, S Coghlan, ME Papka Proceedings of the International Conference on High Performance Computing …, 2013 | 120 | 2013 |
Exploit failure prediction for adaptive fault-tolerance in cluster computing Y Li, Z Lan Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID …, 2006 | 115 | 2006 |
Watch out for the bully! job interference study on dragonfly network X Yang, J Jenkins, M Mubarak, RB Ross, Z Lan SC'16: Proceedings of the International Conference for High Performance …, 2016 | 110 | 2016 |
Lightweight silent data corruption detection based on runtime data analysis for HPC applications E Berrocal, L Bautista-Gomez, S Di, Z Lan, F Cappello Proceedings of the 24th International Symposium on High-Performance Parallel …, 2015 | 103 | 2015 |
Co-analysis of RAS log and job log on Blue Gene/P Z Zheng, L Yu, W Tang, Z Lan, R Gupta, N Desai, S Coghlan, D Buettner 2011 IEEE international parallel & distributed processing symposium, 840-851, 2011 | 103 | 2011 |
Practical online failure prediction for blue gene/p: Period-based vs event-driven L Yu, Z Zheng, Z Lan, S Coghlan 2011 IEEE/IFIP 41st International Conference on Dependable Systems and …, 2011 | 94 | 2011 |
Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P W Tang, N Desai, D Buettner, Z Lan 2010 IEEE international symposium on parallel & distributed processing …, 2010 | 93 | 2010 |
A survey of load balancing in grid computing Y Li, Z Lan International Conference on Computational and Information Science, 280-285, 2004 | 93 | 2004 |
A practical failure prediction with location and lead time for blue gene/p Z Zheng, Z Lan, R Gupta, S Coghlan, P Beckman 2010 International Conference on Dependable Systems and Networks Workshops …, 2010 | 89 | 2010 |
Extreme heterogeneity 2018-productive computational science in the era of extreme heterogeneity: Report for DOE ASCR workshop on extreme heterogeneity JS Vetter, R Brightwell, M Gokhale, P McCormick, R Ross, J Shalf, ... USDOE Office of Science (SC), Washington, DC (United States), 2018 | 86 | 2018 |
Dynamic load balancing of SAMR applications on distributed systems Z Lan, VE Taylor, G Bryan Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 36-36, 2001 | 86 | 2001 |
Dynamic meta-learning for failure prediction in large-scale systems: A case study J Gu, Z Zheng, Z Lan, J White, E Hocks, BH Park 2008 37th International Conference on Parallel Processing, 157-164, 2008 | 83 | 2008 |
A meta-learning failure predictor for blue gene/l systems P Gujrati, Y Li, Z Lan, R Thakur, J White 2007 International Conference on Parallel Processing (ICPP 2007), 40-40, 2007 | 82 | 2007 |
A study of dynamic meta-learning for failure prediction in large-scale systems Z Lan, J Gu, Z Zheng, R Thakur, S Coghlan Journal of parallel and distributed computing 70 (6), 630-643, 2010 | 76 | 2010 |
Reliability-aware scalability models for high performance computing Z Zheng, Z Lan 2009 IEEE International Conference on Cluster Computing and Workshops, 1-9, 2009 | 75 | 2009 |
Dynamic load balancing for structured adaptive mesh refinement applications Z Lan, VE Taylor, G Bryan International Conference on Parallel Processing, 2001., 571-579, 2001 | 74 | 2001 |
Trade-off between prediction accuracy and underestimation rate in job runtime estimates Y Fan, P Rich, WE Allcock, ME Papka, Z Lan 2017 IEEE International Conference on Cluster Computing (CLUSTER), 530-540, 2017 | 73 | 2017 |