Følg
Acyr Locatelli
Acyr Locatelli
Cohere
Verifisert e-postadresse på cohere.ai - Startside
Tittel
Sitert av
Sitert av
År
Snapkv: Llm knows what you are looking for before generation
Y Li, Y Huang, B Yang, B Venkitesh, A Locatelli, H Ye, T Cai, P Lewis, ...
Advances in Neural Information Processing Systems 37, 22947-22970, 2024
1202024
Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning
T Zadouri, A Üstün, A Ahmadian, B Ermiş, A Locatelli, S Hooker
arXiv preprint arXiv:2309.05444, 2023
872023
Aya 23: Open weight releases to further multilingual progress
V Aryabumi, J Dang, D Talupuru, S Dash, D Cairuz, H Lin, B Venkitesh, ...
arXiv preprint arXiv:2405.15032, 2024
722024
Exploring low rank training of deep neural networks
SR Kamalakara, A Locatelli, B Venkitesh, J Ba, Y Gal, AN Gomez
arXiv preprint arXiv:2209.13569, 2022
202022
Bam! just like that: Simple and efficient parameter upcycling for mixture of experts
QI Zhang, N Gritsch, DG Talupuru, S Guo, D Cairuz, B Venkitesh, ...
Advances in Neural Information Processing Systems 37, 56304-56321, 2024
7*2024
Aya 23: Open weight releases to further multilingual progress, 2024
V Aryabumi, J Dang, D Talupuru, S Dash, D Cairuz, H Lin, B Venkitesh, ...
URL https://arxiv. org/abs/2405.15032, 0
5
Aya expanse: Combining research breakthroughs for a new multilingual frontier
J Dang, S Singh, D D'souza, A Ahmadian, A Salamanca, M Smith, ...
arXiv preprint arXiv:2412.04261, 2024
42024
Procedural knowledge in pretraining drives reasoning in large language models
L Ruis, M Mozes, J Bae, SR Kamalakara, D Talupuru, A Locatelli, R Kirk, ...
arXiv preprint arXiv:2411.12580, 2024
42024
Regular cylindrical algebraic decomposition
JH Davenport, AF Locatelli, GK Sankaran
Journal of the London Mathematical Society 101 (1), 43-59, 2020
42020
To code, or not to code? exploring impact of code in pre-training
V Aryabumi, Y Su, R Ma, A Morisot, I Zhang, A Locatelli, M Fadaee, ...
arXiv preprint arXiv:2408.10914, 2024
32024
On the regularity of cylindrical algebraic decompositions
A Locatelli
University of Bath, 2015
22015
Rope to Nope and Back Again: A New Hybrid Attention Strategy
B Yang, B Venkitesh, D Talupuru, H Lin, D Cairuz, P Blunsom, A Locatelli
arXiv preprint arXiv:2501.18795, 2025
12025
Understanding likelihood over-optimisation in direct alignment algorithms
Z Shi, S Land, A Locatelli, M Geist, M Bartolo
arXiv preprint arXiv:2410.11677, 2024
12024
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
N Gritsch, Q Zhang, A Locatelli, S Hooker, A Üstün
arXiv preprint arXiv:2408.15901, 2024
12024
System and Method for Low Rank Training of Neural Networks
SR Kamalakara, B Venkitesh, AN Gomez, AFN Locatelli
US Patent App. 17/814,041, 2023
2023
What Kind of Pretraining Data Do Large Language Models Rely on When Doing Reasoning?
L Ruis, M Mozes, J Bae, SR Kamalakara, D Gnaneshwar, A Locatelli, ...
The Thirteenth International Conference on Learning Representations, 0
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–16