Clip4caption: Clip for video caption M Tang, Z Wang, Z Liu, F Rao, D Li, X Li Proceedings of the 29th ACM International Conference on Multimedia, 4858-4862, 2021 | 139 | 2021 |
Discovery of millihertz X-ray oscillations in a transient ultraluminous X-ray source in M82 H Feng, F Rao, P Kaaret The Astrophysical Journal Letters 710 (2), L137, 2010 | 58 | 2010 |
Detection of strong short-term variability in NGC 6946 X-1 F Rao, H Feng, P Kaaret The Astrophysical Journal 722 (1), 620, 2010 | 40 | 2010 |
LOW-FREQUENCY OSCILLATIONS IN XTE J1550− 564 F Rao, T Belloni, L Stella, SN Zhang, T Li The Astrophysical Journal 714 (2), 1065, 2010 | 36 | 2010 |
Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation L Qi, J Kuen, Z Lin, J Gu, F Rao, D Li, W Guo, Z Wen, MH Yang, J Jia European Conference on Computer Vision, 59-77, 2022 | 15 | 2022 |
Inter-x: Towards versatile human-human interaction analysis L Xu, X Lv, Y Yan, X Jin, S Wu, C Xu, Y Liu, Y Zhou, F Rao, X Sheng, Y Liu, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 11 | 2024 |
Multi-task multi-head attention memory network for fine-grained sentiment analysis Z Dai, W Dai, Z Liu, F Rao, H Chen, G Zhang, Y Ding, J Liu CCF International Conference on Natural Language Processing and Chinese …, 2019 | 11 | 2019 |
Tencent-mvse: A large-scale benchmark dataset for multi-modal video similarity evaluation Z Zeng, Y Luo, Z Liu, F Rao, D Li, W Guo, Z Wen Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 10 | 2022 |
Clip4caption++: Multi-clip for video caption M Tang, Z Wang, Z Zeng, F Rao, D Li arXiv preprint arXiv:2110.05204, 2021 | 9 | 2021 |
Image captioning with multi-context synthetic data F Ma, Y Zhou, F Rao, Y Zhang, X Sun Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4089-4097, 2024 | 7 | 2024 |
ReGenNet: Towards Human Action-Reaction Synthesis L Xu, Y Zhou, Y Yan, X Jin, W Zhu, F Rao, X Yang, W Zeng Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 5 | 2024 |
A similarity alignment model for video copy segment matching Z Liu, F Ma, T Wang, F Rao arXiv preprint arXiv:2305.15679, 2023 | 4 | 2023 |
Visual Perception by Large Language Model's Weights F Ma, H Xue, G Wang, Y Zhou, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... arXiv preprint arXiv:2405.20339, 2024 | 3 | 2024 |
A dual-level detection method for video copy detection T Wang, F Ma, Z Liu, F Rao arXiv preprint arXiv:2305.12361, 2023 | 3 | 2023 |
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling J Yang, D Yin, Y Zhou, F Rao, W Zhai, Y Cao, ZJ Zha arXiv preprint arXiv:2410.10798, 2024 | 2 | 2024 |
Spatial-Semantic Collaborative Cropping for User Generated Content Y Su, Y Cao, J Deng, F Rao, Q Wu Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4988-4997, 2024 | 1 | 2024 |
Number it: Temporal Grounding Videos like Flipping Manga Y Wu, X Hu, Y Sun, Y Zhou, W Zhu, F Rao, B Schiele, X Yang arXiv preprint arXiv:2411.10332, 2024 | | 2024 |
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model F Ma, Y Zhou, H Li, Z He, S Wu, F Rao, Y Zhang, X Sun arXiv preprint arXiv:2408.11795, 2024 | | 2024 |
Multi-Modal Generative Embedding Model F Ma, H Xue, G Wang, Y Zhou, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... arXiv preprint arXiv:2405.19333, 2024 | | 2024 |
Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models F Ma, Y Zhou, Y Zhang, S Wu, Z Zhang, Z He, F Rao, X Sun Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | | 2024 |