Solving quantitative reasoning problems with language models A Lewkowycz, A Andreassen, D Dohan, E Dyer, H Michalewski, ... Advances in Neural Information Processing Systems 35, 3843-3857, 2022 | 749 | 2022 |
Linear Transformers are Secretly Fast Weight Programmers I Schlag*, K Irie*, J Schmidhuber International Conference on Machine Learning, 9355-9366, 2021 | 276* | 2021 |
Block-Recurrent Transformers DL Hutchins*, I Schlag*, Y Wu, E Dyer, B Neyshabur arXiv preprint arXiv:2203.07852, 2022 | 124 | 2022 |
Learning to reason with third order tensor products I Schlag, J Schmidhuber Advances in neural information processing systems 31, 9981-9993, 2018 | 84 | 2018 |
Enhancing the transformer with explicit relational encoding for math problem solving I Schlag, P Smolensky, R Fernandez, N Jojic, J Schmidhuber, J Gao arXiv preprint arXiv:1910.06611, 2019 | 78 | 2019 |
Going beyond linear transformers with recurrent fast weight programmers K Irie*, I Schlag*, R Csordás, J Schmidhuber Advances in Neural Information Processing Systems 34, 2021 | 77 | 2021 |
Mindstorms in Natural Language-Based Societies of Mind M Zhuge, H Liu, F Faccio, DR Ashley, R Csordás, A Gopalakrishnan, ... arXiv preprint arXiv:2305.17066, 2023 | 73 | 2023 |
Learning Associative Inference Using Fast Weight Memory I Schlag, T Munkhdalai, J Schmidhuber International Conference on Learning Representations, 2021 | 50 | 2021 |
A Modern Self-Referential Weight Matrix That Learns to Modify Itself K Irie, I Schlag, R Csordás, J Schmidhuber Deep RL Workshop NeurIPS 2021, 2021 | 42 | 2021 |
Ancient Roman coin recognition in the wild using deep learning based recognition of artistically depicted face profiles I Schlag, O Arandjelovic Proceedings of the IEEE International Conference on Computer Vision …, 2017 | 42 | 2017 |
Gated fast weights for on-the-fly neural program generation I Schlag, J Schmidhuber NIPS Metalearning Workshop, 2017 | 33 | 2017 |
Large Language Model Programs I Schlag, S Sukhbaatar, A Celikyilmaz, W Yih, J Weston, J Schmidhuber, ... arXiv preprint arXiv:2305.05364, 2023 | 26 | 2023 |
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute A Stanić, D Ashley, O Serikov, L Kirsch, F Faccio, J Schmidhuber, ... arXiv preprint arXiv:2309.11197, 2023 | 9 | 2023 |
Solving quantitative reasoning problems with language models, 2022 A Lewkowycz, A Andreassen, D Dohan, E Dyer, H Michalewski, ... URL https://arxiv. org/abs/2206.14858, 0 | 8 | |
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge A Romanou, N Foroutan, A Sotnikova, Z Chen, SH Nelaturu, S Singh, ... arXiv preprint arXiv:2411.19799, 2024 | 6 | 2024 |
Understanding and Minimising Outlier Features in Neural Network Training B He, L Noci, D Paliotta, I Schlag, T Hofmann arXiv preprint arXiv:2405.19279, 2024 | 5 | 2024 |
Block-recurrent transformers (2022) DL Hutchins, I Schlag, Y Wu, E Dyer, B Neyshabur URL https://arxiv. org/abs/2203.07852, 0 | 5 | |
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments A Schäfer, S Ravfogel, T Hofmann, T Pimentel, I Schlag arXiv preprint arXiv:2404.07982, 2024 | 3* | 2024 |
Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies S Anagnostidis, G Bachmann, I Schlag, T Hofmann arXiv preprint arXiv:2311.03233, 2024 | 3 | 2024 |
Understanding and Minimising Outlier Features in Transformer Training B He, L Noci, D Paliotta, I Schlag, T Hofmann Advances in Neural Information Processing Systems 37, 83786-83846, 2025 | 1 | 2025 |