关注
Xiao Wang
Xiao Wang
Google DeepMind
在 google.com 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Pali: A jointly-scaled multilingual language-image model
X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ...
ICLR 2023 (Oral), 2022
6932022
Scaling vision transformers to 22 billion parameters
M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ...
ICML 2023 (Oral), 2023
6042023
LiT: Zero-Shot Transfer with Locked-image Text Tuning
X Zhai, X Wang, B Mustafa, A Steiner, D Keysers, A Kolesnikov, L Beyer
CVPR 2022, 2021
5912021
Simple Open-Vocabulary Object Detection with Vision Transformers
M Minderer, A Gritsenko, A Stone, M Neumann, D Weissenborn, ...
ECCV 2022, 2022
540*2022
Measuring compositional generalization: A comprehensive method on realistic data
D Keysers, N Schärli, N Scales, H Buisman, D Furrer, S Kashubin, ...
ICLR 2020, 2019
3922019
Pali-x: On scaling up a multilingual vision and language model
X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ...
CVPR 2024, 2023
1782023
Paligemma: A versatile 3b vlm for transfer
L Beyer, A Steiner, AS Pinto, A Kolesnikov, X Wang, D Salz, M Neumann, ...
arXiv preprint arXiv:2407.07726, 2024
1472024
Pali-3 vision language models: Smaller, faster, stronger
X Chen, X Wang, L Beyer, A Kolesnikov, J Wu, P Voigtlaender, B Mustafa, ...
arXiv preprint arXiv:2310.09199, 2023
862023
No filter: Cultural and socioeconomic diversityin contrastive vision-language models
A Pouget, L Beyer, E Bugliarello, X Wang, AP Steiner, X Zhai, ...
NeurIPS 2024, 2024
142024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
I Alabdulmohsin, X Wang, A Steiner, P Goyal, A D'Amour, X Zhai
ICLR 2024, 2024
122024
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
J Kossen, M Collier, B Mustafa, X Wang, X Zhai, L Beyer, A Steiner, ...
NeuIPS 2023, 2023
102023
A study of autoregressive decoders for multi-tasking in computer vision
L Beyer, B Wan, G Madan, F Pavetic, A Steiner, A Kolesnikov, AS Pinto, ...
arXiv preprint arXiv:2303.17376, 2023
82023
Paligemma 2: A family of versatile vlms for transfer
A Steiner, AS Pinto, M Tschannen, D Keysers, X Wang, Y Bitton, ...
arXiv preprint arXiv:2412.03555, 2024
72024
LocCa: Visual Pretraining with Location-aware Captioners
B Wan, M Tschannen, Y Xian, F Pavetic, I Alabdulmohsin, X Wang, ...
NeurIPS 2024, 2024
52024
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
XZ Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem ...
2025
Scaling Pre-training to One Hundred Billion Data for Vision Language Models
X Wang, I Alabdulmohsin, D Salz, Z Li, K Rong, X Zhai
arXiv preprint arXiv:2502.07617, 2025
2025
Locked-Model Multimodal Contrastive Tuning
D Keysers, X Zhai, X Wang, L Beyer, B Mustafa, A Steiner, A Kolesnikov
US Patent App. 18/051,106, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–17