Follow
Guo Chen
Guo Chen
Verified email at smail.nju.edu.cn
Title
Cited by
Cited by
Year
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks
Z Chen, J Wu, W Wang, W Su, G Chen, S Xing, M Zhong, Q Zhang, X Zhu, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
383*2024
Internvideo: General video foundation models via generative and discriminative learning
Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ...
arXiv preprint arXiv:2212.03191, 2022
2812022
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
ICLR2023, 2023
1552023
Mvbench: A comprehensive multi-modal video understanding benchmark
K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
1502024
Videollm: Modeling video sequence with large language models
G Chen, YD Zheng, J Wang, J Xu, Y Huang, J Pan, Y Wang, Y Wang, ...
arXiv preprint arXiv:2305.13292, 2023
712023
Dcan: improving temporal action detection via dual context aggregation
G Chen, YD Zheng, L Wang, T Lu
Proceedings of the AAAI conference on artificial intelligence 36 (1), 248-257, 2022
642022
Internvideo2: Scaling video foundation models for multimodal video understanding
Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, J Xu, Z Wang, ...
arXiv preprint arXiv:2403.15377, 2024
592024
Video mamba suite: State space model as a versatile alternative for video understanding
G Chen, Y Huang, J Xu, B Pei, Z Chen, Z Li, J Wang, K Li, T Lu, L Wang
arXiv preprint arXiv:2403.09626, 2024
472024
Basictad: an astounding rgb-only baseline for temporal action detection
M Yang, G Chen, YD Zheng, T Lu, L Wang
Computer Vision and Image Understanding 232, 103692, 2023
402023
Internvideo-ego4d: A pack of champion solutions to ego4d challenges
G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ...
arXiv preprint arXiv:2211.09529, 2022
402022
Avsegformer: Audio-visual segmentation with transformer
S Gao, Z Chen, G Chen, W Wang, T Lu
Proceedings of the AAAI Conference on Artificial Intelligence 38 (11), 12155 …, 2024
312024
Memory-and-anticipation transformer for online action understanding
J Wang*, G Chen*, Y Huang, L Wang, T Lu
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
272023
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
Z Chen, J Wang, W Wang, G Chen, E Xie, P Luo, T Lu
arXiv preprint arXiv:2111.02394, 2021
202021
Retrieval-augmented egocentric video captioning
J Xu, Y Huang, J Hou, G Chen, Y Zhang, R Feng, W Xie
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
142024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World
Y Huang, G Chen, J Xu, M Zhang, L Yang, B Pei, H Zhang, L Dong, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
122024
Mrsn: Multi-relation support network for video action detection
YD Zheng, G Chen, M Yuan, T Lu
2023 IEEE International Conference on Multimedia and Expo (ICME), 1026-1031, 2023
102023
Egovideo: Exploring egocentric foundation model and downstream adaptation
B Pei, G Chen, J Xu, Y He, Y Liu, K Pan, Y Huang, Y Wang, T Lu, L Wang, ...
arXiv preprint arXiv:2406.18070, 2024
32024
Matching Compound Prototypes for Few-Shot Action Recognition
Y Huang, L Yang, G Chen, H Zhang, F Lu, Y Sato
International Journal of Computer Vision, 1-26, 2024
22024
Champion solution for the WSDM2023 toloka VQA challenge
S Gao, Z Chen, G Chen, W Wang, T Lu
arXiv preprint arXiv:2301.09045, 2023
22023
The system can't perform the operation now. Try again later.
Articles 1–19