Voxceleb: a large-scale speaker identification dataset A Nagrani, JS Chung, A Zisserman arXiv preprint arXiv:1706.08612, 2017 | 2730 | 2017 |
Voxceleb2: Deep speaker recognition JS Chung, A Nagrani, A Zisserman arXiv preprint arXiv:1806.05622, 2018 | 2607 | 2018 |
Frozen in time: A joint video and image encoder for end-to-end retrieval M Bain, A Nagrani, G Varol, A Zisserman Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 1036 | 2021 |
Voxceleb: Large-scale speaker verification in the wild A Nagrani, JS Chung, W Xie, A Zisserman Computer Speech & Language 60, 101027, 2020 | 752 | 2020 |
Attention bottlenecks for multimodal fusion A Nagrani, S Yang, A Arnab, A Jansen, C Schmid, C Sun Advances in neural information processing systems 34, 14200-14213, 2021 | 595 | 2021 |
Use what you have: Video retrieval using representations from collaborative experts Y Liu, S Albanie, A Nagrani, A Zisserman arXiv preprint arXiv:1907.13487, 2019 | 440 | 2019 |
Utterance-level aggregation for speaker recognition in the wild W Xie, A Nagrani, JS Chung, A Zisserman ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 424 | 2019 |
Epic-fusion: Audio-visual temporal binding for egocentric action recognition E Kazakos, A Nagrani, A Zisserman, D Damen Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 410 | 2019 |
Emotion recognition in speech using cross-modal transfer in the wild S Albanie, A Nagrani, A Vedaldi, A Zisserman Proceedings of the 26th ACM international conference on Multimedia, 292-301, 2018 | 324 | 2018 |
Seeing voices and hearing faces: Cross-modal biometric matching A Nagrani, S Albanie, A Zisserman Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 255 | 2018 |
Chimpanzee face recognition from videos in the wild using deep learning D Schofield, A Nagrani, A Zisserman, M Hayashi, T Matsuzawa, D Biro, ... Science advances 5 (9), eaaw0736, 2019 | 211 | 2019 |
Localizing visual sounds the hard way H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 204 | 2021 |
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 195 | 2023 |
End-to-end generative pretraining for multimodal video captioning PH Seo, A Nagrani, A Arnab, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 186 | 2022 |
Spot the conversation: speaker diarisation in the wild JS Chung, J Huh, A Nagrani, T Afouras, A Zisserman arXiv preprint arXiv:2007.01216, 2020 | 174 | 2020 |
Learnable pins: Cross-modal embeddings for person identity A Nagrani, S Albanie, A Zisserman Proceedings of the European conference on computer vision (ECCV), 71-88, 2018 | 166 | 2018 |
Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... arXiv preprint arXiv:2305.18565, 2023 | 142 | 2023 |
Cough against covid: Evidence of covid-19 signature in cough sounds P Bagad, A Dalmia, J Doshi, A Nagrani, P Bhamare, A Mahale, S Rane, ... arXiv preprint arXiv:2009.08790, 2020 | 138 | 2020 |
Disentangled speech embeddings using cross-modal self-supervision A Nagrani, JS Chung, S Albanie, A Zisserman ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 108 | 2020 |
Condensed movies: Story based retrieval with contextual embeddings M Bain, A Nagrani, A Brown, A Zisserman Proceedings of the Asian Conference on Computer Vision, 2020 | 107 | 2020 |