Antoine Miech

Cited by

	All	Since 2019
Citations	9980	9882
h-index	20	20
i10-index	22	22

6000

3000

1500

4500

201820192020202120222023202472 104 224 525 934 2157 5755

Public access

View all

9 articles

2 articles

available

not available

Based on funding mandates

Co-authors

Ivan LaptevProfessor at MBZUAI, on leave from INRIAVerified email at inria.fr
Josef SivicCzech Technical University, CIIRC, ELLIS Unit PragueVerified email at cvut.cz
Jean-Baptiste AlayracDeepMind, LondonVerified email at google.com
Andrew ZissermanUniversity of OxfordVerified email at robots.ox.ac.uk
Cordelia SchmidResearch director INRIA Verified email at inria.fr
Antoine YangGoogle DeepMindVerified email at google.com
Makarand TapaswiIIIT Hyderabad, Wadhwani AIVerified email at iiit.ac.in
Dimitri ZhukovTractableVerified email at tractable.ai
Lorenzo TorresaniMeta, Fundamental AI Research (FAIR)Verified email at meta.com
Heng WangNvidiaVerified email at fb.com
Du TranGoogleVerified email at google.com
Jeff DonahueResearch Scientist, DeepMindVerified email at google.com
Piotr BojanowskiMeta FAIRVerified email at fb.com
Karen SimonyanChief Scientist, Microsoft AIVerified email at microsoft.com

Antoine Miech

Google DeepMind

Verified email at google.com - Homepage

Computer Vision


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in neural information processing systems 35, 23716-23736, 2022	3205	2022
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023	2084	2023
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips A Miech, D Zhukov, JB Alayrac, M Tapaswi, I Laptev, J Sivic Proceedings of the IEEE International Conference on Computer Vision, 2630-2640, 2019	1235	2019
End-to-end learning of visual representations from uncurated instructional videos A Miech, JB Alayrac, L Smaira, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	805	2020
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024	617	2024
Learnable pooling with context gating for video classification A Miech, I Laptev, J Sivic arXiv preprint arXiv:1706.06905, 2017	403	2017
Just ask: Learning to answer questions from millions of narrated videos A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF international conference on computer vision …, 2021	295	2021
Learning a text-video embedding from incomplete and heterogeneous data A Miech, I Laptev, J Sivic arXiv preprint arXiv:1804.02516, 2018	261	2018
Zero-shot video question answering via frozen bidirectional language models A Yang, A Miech, J Sivic, I Laptev, C Schmid Advances in Neural Information Processing Systems 35, 124-141, 2022	209	2022
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	195	2023
Thinking fast and slow: Efficient text-to-visual retrieval with transformers A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	148	2021
Tubedetr: Spatio-temporal video grounding with transformers A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	97	2022
Leveraging the present to anticipate the future in videos A Miech, I Laptev, J Sivic, H Wang, L Torresani, D Tran Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019	84	2019
Mikoł aj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karén Simonyan. Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in Neural Information Processing Systems 35, 23716-23736, 2022	75	2022
Learning from video and text via large-scale discriminative clustering A Miech, JB Alayrac, P Bojanowski, I Laptev, J Sivic Proceedings of the IEEE international conference on computer vision, 5257-5266, 2017	50	2017
Perception test: A diagnostic benchmark for multimodal video models V Patraucean, L Smaira, A Gupta, A Recasens, L Markeeva, D Banarse, ... Advances in Neural Information Processing Systems 36, 2024	46	2024
Learning to answer visual questions from web videos A Yang, A Miech, J Sivic, I Laptev, C Schmid arXiv preprint arXiv:2205.05019, 2022	38	2022
Look for the change: Learning object states and state-modifying actions from untrimmed web videos T Souček, JB Alayrac, A Miech, I Laptev, J Sivic Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	29	2022
Rareact: A video dataset of unusual interactions A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman arXiv preprint arXiv:2008.01018, 2020	26	2020
Zorro: the masked multimodal transformer A Recasens, J Lin, J Carreira, D Jaegle, L Wang, J Alayrac, P Luc, ... arXiv preprint arXiv:2301.09595, 2023	22	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors