Grounding dino: Marrying dino with grounded pre-training for open-set object detection S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang, C Li, J Yang, H Su, J Zhu, ... arXiv preprint arXiv:2303.05499, 2023 | 1278 | 2023 |
Pixel-bert: Aligning image pixels with text by deep multi-modal transformers Z Huang, Z Zeng, B Liu, D Fu, J Fu arXiv preprint arXiv:2004.00849, 2020 | 453 | 2020 |
Seeing out of the box: End-to-end pre-training for vision-language representation learning Z Huang, Z Zeng, Y Huang, B Liu, D Fu, J Fu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 291 | 2021 |
Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection Z Zeng, B Liu, J Fu, H Chao, L Zhang Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 176 | 2019 |
Grounded sam: Assembling open-world models for diverse visual tasks T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen, X Huang, Y Chen, F Yan, ... arXiv preprint arXiv:2401.14159, 2024 | 162 | 2024 |
Active contrastive learning of audio-visual video representations S Ma, Z Zeng, D McDuff, Y Song arXiv preprint arXiv:2009.09805, 2020 | 114 | 2020 |
Mind the discriminability: Asymmetric adversarial domain adaptation J Yang, H Zou, Y Zhou, Z Zeng, L Xie Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 57 | 2020 |
Contrastive learning of global and local video representations Z Zeng, D McDuff, Y Song Advances in Neural Information Processing Systems 34, 7025-7040, 2021 | 54 | 2021 |
GarbageNet: a unified learning framework for robust garbage classification J Yang, Z Zeng, K Wang, H Zou, L Xie IEEE Transactions on Artificial Intelligence 2 (4), 372-380, 2021 | 54 | 2021 |
Suppressing mislabeled data via grouping and self-attention X Peng, K Wang, Z Zeng, Q Li, J Yang, Y Qiao Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 40 | 2020 |
Smp challenge: An overview of social media prediction challenge 2019 B Wu, WH Cheng, P Liu, B Liu, Z Zeng, J Luo Proceedings of the 27th ACM International Conference on Multimedia, 2667-2671, 2019 | 40 | 2019 |
Reference-based defect detection network Z Zeng, B Liu, J Fu, H Chao IEEE Transactions on Image Processing 30, 6637-6647, 2021 | 36 | 2021 |
Detection transformer with stable matching S Liu, T Ren, J Chen, Z Zeng, H Zhang, F Li, H Li, J Huang, H Su, J Zhu, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 31 | 2023 |
Dfa3d: 3d deformable attention for 2d-to-3d feature lifting H Li, H Zhang, Z Zeng, S Liu, F Li, T Ren, L Zhang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 15 | 2023 |
T-rex2: Towards generic object detection via text-visual prompt synergy Q Jiang, F Li, Z Zeng, T Ren, S Liu, L Zhang European Conference on Computer Vision, 38-57, 2025 | 14 | 2025 |
detrex: Benchmarking detection transformers T Ren, S Liu, F Li, H Zhang, A Zeng, J Yang, X Liao, D Jia, H Li, H Cao, ... arXiv preprint arXiv:2306.07265, 2023 | 12 | 2023 |
Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv 2020 Z Huang, Z Zeng, B Liu, D Fu, J Fu arXiv preprint arXiv:2004.00849, 2020 | 12 | 2020 |
Activitynet 2019 task 3: Exploring contexts for dense captioning events in videos S Chen, Y Song, Y Zhao, Q Jin, Z Zeng, B Liu, J Fu, A Hauptmann arXiv preprint arXiv:1907.05092, 2019 | 12 | 2019 |
Learning rich image region representation for visual question answering B Liu, Z Huang, Z Zeng, Z Chen, J Fu arXiv preprint arXiv:1910.13077, 2019 | 11 | 2019 |
Tencent-mvse: A large-scale benchmark dataset for multi-modal video similarity evaluation Z Zeng, Y Luo, Z Liu, F Rao, D Li, W Guo, Z Wen Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 10 | 2022 |