Følg
Yuhang Cao
Yuhang Cao
MMLab The Chinese University of Hong Kong
Verifisert e-postadresse på ie.cuhk.edu.hk
Tittel
Sitert av
Sitert av
År
MMDetection: Open mmlab detection toolbox and benchmark
K Chen, J Wang, J Pang, Y Cao, Y Xiong, X Li, S Sun, W Feng, Z Liu, J Xu, ...
arXiv preprint arXiv:1906.07155, 2019
35682019
Seesaw loss for long-tailed instance segmentation
J Wang, W Zhang, Y Zang, Y Cao, J Pang, T Gong, K Chen, Z Liu, CC Loy, ...
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
3112021
Prime sample attention in object detection
Y Cao, K Chen, CC Loy, D Lin
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020
2742020
Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model
X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, X Wei, S Zhang, ...
arXiv preprint arXiv:2401.16420, 2024
2402024
Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition
P Zhang, X Dong, B Wang, Y Cao, C Xu, L Ouyang, Z Zhao, H Duan, ...
arXiv preprint arXiv:2309.15112, 2023
2012023
Side-aware boundary localization for more precise object detection
J Wang, W Zhang, Y Cao, K Chen, J Pang, T Gong, J Shi, CC Loy, D Lin
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020
1842020
MMDetection: open mmlab detection toolbox and benchmark. 2019
K Chen, J Wang, J Pang, Y Cao, Y Xiong, X Li, S Sun, W Feng, Z Liu, J Xu, ...
arXiv preprint arXiv:1906.07155, 1906
1341906
Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd
X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, S Zhang, H Duan, ...
Advances in Neural Information Processing Systems 37, 42566-42592, 2024
1182024
Few-shot object detection via association and discrimination
Y Cao, J Wang, Y Jin, T Wu, K Chen, Z Liu, D Lin
Advances in neural information processing systems 34, 16570-16581, 2021
1162021
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output
P Zhang, X Dong, Y Zang, Y Cao, R Qian, L Chen, Q Guo, H Duan, ...
arXiv preprint arXiv:2407.03320, 2024
862024
Feature pyramid grids
K Chen, Y Cao, CC Loy, D Lin, C Feichtenhofer
arXiv preprint arXiv:2004.03580, 2020
672020
V3det: Vast vocabulary visual detection dataset
J Wang, P Zhang, T Chu, Y Cao, Y Zhou, T Wu, B Wang, C He, D Lin
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
572023
Pyramiddrop: Accelerating your large vision-language models via pyramid visual redundancy reduction
L Xing, Q Huang, X Dong, J Lu, P Zhang, Y Zang, Y Cao, C He, J Wang, ...
arXiv preprint arXiv:2410.17247, 2024
132024
Wssod: A new pipeline for weakly-and semi-supervised object detection
S Fang, Y Cao, X Wang, K Chen, D Lin, W Zhang
arXiv preprint arXiv:2105.11293, 2021
132021
Mini: Mining implicit novel instances for few-shot object detection
Y Cao, J Wang, Y Lin, D Lin
arXiv preprint arXiv:2205.03381, 2022
122022
YuXiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detectiontoolbox and benchmark
K Chen, J Wang, J Pang, Y Cao
arXiv preprint arXiv:1906.07155 6, 2019
122019
Dualfocus: Integrating macro and micro perspectives in multi-modal large language models
Y Cao, P Zhang, X Dong, D Lin, J Wang
arXiv preprint arXiv:2402.14767, 2024
112024
Mia-dpo: Multi-image augmented direct preference optimization for large vision-language models
Z Liu, Y Zang, X Dong, P Zhang, Y Cao, H Duan, C He, Y Xiong, D Lin, ...
arXiv preprint arXiv:2410.17637, 2024
72024
Internlm-xcomposer2. 5-omnilive: A comprehensive multimodal system for long-term streaming video and audio interactions
P Zhang, X Dong, Y Cao, Y Zang, R Qian, X Wei, L Chen, Y Li, J Niu, ...
arXiv preprint arXiv:2412.09596, 2024
52024
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree
S Ding, R Qian, X Dong, P Zhang, Y Zang, Y Cao, Y Guo, D Lin, J Wang
arXiv preprint arXiv:2410.16268, 2024
42024
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–20