Mlperf inference benchmark VJ Reddi, C Cheng, D Kanter, P Mattson, G Schmuelling, CJ Wu, ... 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 669* | 2020 |
Automatically tuning sparse matrix-vector multiplication for GPU architectures A Monakov, A Lokhmotov, A Avetisyan International Conference on High-Performance Embedded Architectures and …, 2010 | 374 | 2010 |
Benchmarking tinyml systems: Challenges and direction CR Banbury, VJ Reddi, M Lam, W Fu, A Fazel, J Holleman, X Huang, ... arXiv preprint arXiv:2003.04821, 2020 | 303 | 2020 |
Pencil: A platform-neutral compute intermediate language for accelerator programming R Baghdadi, U Beaugnon, A Cohen, T Grosser, M Kruse, C Reddy, ... 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 164 | 2015 |
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation D Grewe, A Lokhmotov Proceedings of the Fourth Workshop on General Purpose Processing on Graphics …, 2011 | 74 | 2011 |
Collective Mind: Towards Practical and Collaborative Auto‐Tuning G Fursin, R Miceli, A Lokhmotov, M Gerndt, M Baboulin, AD Malony, ... Scientific Programming 22 (4), 309-329, 2014 | 46 | 2014 |
Deriving efficient data movement from decoupled access/execute specifications LW Howes, A Lokhmotov, AF Donaldson, PHJ Kelly High Performance Embedded Architectures and Compilers: Fourth International …, 2009 | 45 | 2009 |
Collective knowledge: Towards R&D sustainability G Fursin, A Lokhmotov, E Plowman 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 864-869, 2016 | 37 | 2016 |
VOBLA: A vehicle for optimized basic linear algebra U Beaugnon, A Kravets, S Van Haastregt, R Baghdadi, D Tweed, J Absar, ... Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers …, 2014 | 28 | 2014 |
Benchmarking TinyML systems: Challenges and direction. arXiv 2020 CR Banbury, VJ Reddi, M Lam, W Fu, A Fazel, J Holleman, X Huang, ... arXiv preprint arXiv:2003.04821, 2003 | 26 | 2003 |
PENCIL 1.0 Language Specification R Baghdadi, A Cohen, T Grosser, S Verdoolaege, J Absar, ... Research Report 8706, INRIA, 2015 | 25* | 2015 |
PENCIL: Towards a platform-neutral compute intermediate language for DSLs R Baghdadi, A Cohen, S Guelton, S Verdoolaege, J Inoue, T Grosser, ... arXiv preprint arXiv:1302.5586, 2013 | 24 | 2013 |
A collective knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques G Fursin, A Lokhmotov, D Savenko, E Upton arXiv preprint arXiv:1801.08024, 2018 | 20 | 2018 |
Collective Mind, Part II: Towards performance-and cost-aware software engineering as a natural science G Fursin, A Memon, C Guillon, A Lokhmotov arXiv preprint arXiv:1506.06256, 2015 | 18 | 2015 |
Auto-parallelisation of Sieve C++ programs A Donaldson, C Riley, A Lokhmotov, A Cook Euro-Par 2007 Workshops: Parallel Processing: HPPC 2007, UNICORE Summit 2007 …, 2008 | 18 | 2008 |
Delayed side-effects ease multi-core programming A Lokhmotov, A Mycroft, A Richards Euro-Par 2007 Parallel Processing: 13th International Euro-Par Conference …, 2007 | 17 | 2007 |
Optimizing OpenCL Kernels for the ARM Mali-T600 GPUs J Gronqvist, A Lokhmotov GPU Pro 360 Guide to Mobile Devices, 167-198, 2018 | 15 | 2018 |
Generating GPU code from a high-level representation for image processing kernels R Membarth, A Lokhmotov, J Teich Euro-Par 2011: Parallel Processing Workshops: CCPI, CGWS, HeteroPar, HiBB …, 2012 | 15 | 2012 |
On the anatomy of predictive models for accelerating GPU convolution kernels and beyond PS Labini, M Cianfriglia, D Perri, O Gervasi, G Fursin, A Lokhmotov, ... ACM Transactions on Architecture and Code Optimization (TACO) 18 (1), 1-24, 2021 | 14 | 2021 |
Configuring thread scheduling on a multi-threaded data processing apparatus C Nugteren, A Lokhmotov US Patent 10,733,012, 2020 | 14 | 2020 |