Theo dõi
Haotian Zhang
Haotian Zhang
Research Scientist, Apple
Email được xác minh tại apple.com - Trang chủ
Tiêu đề
Trích dẫn bởi
Trích dẫn bởi
Năm
Grounded language-image pre-training
LH Li*, P Zhang*, H Zhang*, J Yang, C Li, Y Zhong, L Wang, L Yuan, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
12022022
Glipv2: Unifying localization and vision-language understanding
H Zhang*, P Zhang*, X Hu, YC Chen, LH Li, X Dai, L Wang, L Yuan, ...
NeurIPS, 2022
3132022
Ferret: Refer and ground anything anywhere at any granularity
H You*, H Zhang*, Z Gan, X Du, B Zhang, Z Wang, L Cao, SF Chang, ...
ICLR, 2023
2632023
Simple applications of BERT for ad hoc document retrieval
W Yang, H Zhang, J Lin
arXiv preprint arXiv:1903.10972, 2019
2422019
Exploit the connectivity: Multi-object tracking with trackletnet
G Wang, Y Wang, H Zhang, R Gu, JN Hwang
Proceedings of the 27th ACM international conference on multimedia, 482-490, 2019
2402019
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
B McKinzie*, Z Gan*, JP Fauconnier, S Dodge, B Zhang, P Dufter, D Shah, ...
ECCV, 2024
2172024
Transmvsnet: Global context-aware multi-view stereo network with transformers
Y Ding, W Yuan, Q Zhu, H Zhang, X Liu, Y Wang, X Liu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
2162022
An internal learning approach to video inpainting
H Zhang, L Mai, N Xu, Z Wang, J Collomosse, H Jin
Proceedings of the IEEE/CVF international conference on computer vision …, 2019
992019
Eye in the sky: Drone-based object tracking and 3d localization
H Zhang, G Wang, Z Lei, JN Hwang
Proceedings of the 27th ACM international conference on multimedia, 899-907, 2019
922019
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
K You, H Zhang, E Schoop, F Weers, A Swearngin, J Nichols, Y Yang, ...
ECCV, 2024
842024
Visdrone-mot2019: The vision meets drone multiple object tracking challenge results
L Wen, P Zhu, D Du, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019
642019
VisDrone-SOT2019: The vision meets drone single object tracking challenge results
D Du, P Zhu, L Wen, X Bian, H Ling, Q Hu, J Zheng, T Peng, X Wang, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019
612019
Apple intelligence foundation language models
T Gunter, Z Wang, C Wang, R Pang, A Narayanan, A Zhang, B Zhang, ...
arXiv preprint arXiv:2407.21075, 2024
412024
Ferret-v2: An improved baseline for referring and grounding with large language models
H Zhang
COLM, 2024
30*2024
How easy is it to fool your multimodal llms? an empirical analysis on deceptive prompts
Y Qian, H Zhang, Y Yang, Z Gan
arXiv preprint arXiv:2402.13220 2 (7), 2024
282024
From scarcity to efficiency: Improving clip training via visual-enriched captions
Z Lai*, H Zhang*, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ...
ECCV2024, 2023
272023
From scarcity to efficiency: Improving clip training via visual-enriched captions
Z Lai*, H Zhang*, B Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ...
European Conference on Computer Vision, 111-127, 2025
21*2025
Bundle adjustment for monocular visual odometry based on detections of traffic signs
Y Zhang, H Zhang, G Wang, J Yang, JN Hwang
IEEE transactions on vehicular technology 69 (1), 151-162, 2019
212019
Empowering unsupervised domain adaptation with large-scale pre-trained vision-language models
Z Lai, H Bai, H Zhang, X Du, J Shan, Y Yang, CN Chuah, M Cao
Proceedings of the ieee/cvf winter conference on applications of computer …, 2024
192024
MM1. 5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
H Zhang*, M Gao*, Z Gan*, P Dufter, N Wenzel, F Huang, D Shah, X Du, ...
ICLR2025, 2024
182024
Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.
Bài viết 1–20