Haotian Zhang

Trích dẫn bởi

	Tất cả	Từ 2020
Trích dẫn	3424	3372
h-index	19	19
i10-index	27	26

1800

900

450

1350

201920202021202220232024202544 130 196 281 693 1761 309

Truy cập công khai

Xem tất cả

2 bài viết

1 bài viết

có sẵn

không có sẵn

Dựa trên yêu cầu tài trợ

Đồng tác giả

Jenq-Neng HwangUniversity of WashingtonEmail được xác minh tại u.washington.edu
Yinfei YangAppleEmail được xác minh tại apple.com
Zhe GanResearch Scientist, AppleEmail được xác minh tại apple.com
Bowen ZhangAppleEmail được xác minh tại apple.com
Yizhou WangNVIDIA; University of WashingtonEmail được xác minh tại nvidia.com
Jianfeng GaoMicrosoft Research, RedmondEmail được xác minh tại microsoft.com
Pengchuan ZhangMeta AIEmail được xác minh tại fb.com
Lijuan WangMicrosoft GenAIEmail được xác minh tại microsoft.com
Liunian Harold LiOpenAIEmail được xác minh tại cs.ucla.edu
Xianzhi DuResearch Scientist, Apple AI/MLEmail được xác minh tại apple.com
Gaoang WangZhejiang University / University of Illinois Urbana-Champaign InstituteEmail được xác minh tại intl.zju.edu.cn
Haoxuan YouColumbia UniversityEmail được xác minh tại columbia.edu
Lei ZhangInternational Digital Economy Academy (IDEA)Email được xác minh tại idea.edu.cn
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondEmail được xác minh tại microsoft.com
Chunyuan LixAIEmail được xác minh tại x.ai
Yanghao LiFacebook AI Research (FAIR)Email được xác minh tại fb.com

Theo dõi

Haotian Zhang

Research Scientist, Apple

Email được xác minh tại apple.com - Trang chủ

Deep Learning Computer Vision Vision + Language


Tiêu đề Sắp xếp theo số lượt trích dẫn Sắp xếp theo năm Sắp xếp theo tiêu đề	Trích dẫn bởi Trích dẫn bởi	Năm
Grounded language-image pre-training LH Li, P Zhang, H Zhang*, J Yang, C Li, Y Zhong, L Wang, L Yuan, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	1202	2022
Glipv2: Unifying localization and vision-language understanding H Zhang, P Zhang, X Hu, YC Chen, LH Li, X Dai, L Wang, L Yuan, ... NeurIPS, 2022	313	2022
Ferret: Refer and ground anything anywhere at any granularity H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang, L Cao, SF Chang, ... ICLR, 2023	263	2023
Simple applications of BERT for ad hoc document retrieval W Yang, H Zhang, J Lin arXiv preprint arXiv:1903.10972, 2019	242	2019
Exploit the connectivity: Multi-object tracking with trackletnet G Wang, Y Wang, H Zhang, R Gu, JN Hwang Proceedings of the 27th ACM international conference on multimedia, 482-490, 2019	240	2019
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training B McKinzie, Z Gan, JP Fauconnier, S Dodge, B Zhang, P Dufter, D Shah, ... ECCV, 2024	217	2024
Transmvsnet: Global context-aware multi-view stereo network with transformers Y Ding, W Yuan, Q Zhu, H Zhang, X Liu, Y Wang, X Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	216	2022
An internal learning approach to video inpainting H Zhang, L Mai, N Xu, Z Wang, J Collomosse, H Jin Proceedings of the IEEE/CVF international conference on computer vision …, 2019	99	2019
Eye in the sky: Drone-based object tracking and 3d localization H Zhang, G Wang, Z Lei, JN Hwang Proceedings of the 27th ACM international conference on multimedia, 899-907, 2019	92	2019
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs K You, H Zhang, E Schoop, F Weers, A Swearngin, J Nichols, Y Yang, ... ECCV, 2024	84	2024
Visdrone-mot2019: The vision meets drone multiple object tracking challenge results L Wen, P Zhu, D Du, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019	64	2019
VisDrone-SOT2019: The vision meets drone single object tracking challenge results D Du, P Zhu, L Wen, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019	61	2019
Apple intelligence foundation language models T Gunter, Z Wang, C Wang, R Pang, A Narayanan, A Zhang, B Zhang, ... arXiv preprint arXiv:2407.21075, 2024	41	2024
Ferret-v2: An improved baseline for referring and grounding with large language models H Zhang COLM, 2024	30*	2024
How easy is it to fool your multimodal llms? an empirical analysis on deceptive prompts Y Qian, H Zhang, Y Yang, Z Gan arXiv preprint arXiv:2402.13220 2 (7), 2024	28	2024
From scarcity to efficiency: Improving clip training via visual-enriched captions Z Lai, H Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ... ECCV2024, 2023	27	2023
From scarcity to efficiency: Improving clip training via visual-enriched captions Z Lai, H Zhang, B Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ... European Conference on Computer Vision, 111-127, 2025	21*	2025
Bundle adjustment for monocular visual odometry based on detections of traffic signs Y Zhang, H Zhang, G Wang, J Yang, JN Hwang IEEE transactions on vehicular technology 69 (1), 151-162, 2019	21	2019
Empowering unsupervised domain adaptation with large-scale pre-trained vision-language models Z Lai, H Bai, H Zhang, X Du, J Shan, Y Yang, CN Chuah, M Cao Proceedings of the ieee/cvf winter conference on applications of computer …, 2024	19	2024
MM1. 5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning H Zhang, M Gao, Z Gan*, P Dufter, N Wenzel, F Huang, D Shah, X Du, ... ICLR2025, 2024	18	2024

Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.

Bài viết 1–20

Trích dẫn mỗi năm

Trích dẫn trùng lặp

Trích dẫn được hợp nhất

Thêm đồng tác giảĐồng tác giả

Theo dõi

Trích dẫn bởi

Đồng tác giả