Segui
Sihan Chen
Titolo
Citata da
Citata da
Anno
Cptr: Full transformer network for image captioning
W Liu, S Chen, L Guo, X Zhu, J Liu
arXiv preprint arXiv:2101.10804, 2021
2042021
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset
S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu
Advances in Neural Information Processing Systems 36, 72842-72866, 2023
842023
Valor: Vision-audio-language omni-perception pretraining model and dataset
S Chen, X He, L Guo, X Zhu, W Wang, J Tang, J Liu
IEEE transactions on pattern analysis and machine intelligence, 2023
822023
Chatbridge: Bridging modalities with large language model as a language catalyst
Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu
arXiv preprint arXiv:2305.16103, 2023
442023
Vl-mamba: Exploring state space models for multimodal learning
Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu
arXiv preprint arXiv:2403.13600, 2024
362024
Global-local propagation network for RGB-D semantic segmentation
S Chen, X Zhu, W Liu, X He, J Liu
arXiv preprint arXiv:2101.10801, 2021
242021
Vlab: Enhancing video language pre-training by feature adapting and blending
X He, S Chen, F Ma, Z Huang, X Jin, Z Liu, D Fu, Y Yang, J Liu, J Feng
IEEE Transactions on Multimedia, 2023
192023
Sounding video generator: A unified framework for text-guided sounding video generation
J Liu, W Wang, S Chen, X Zhu, J Liu
IEEE Transactions on Multimedia 26, 141-153, 2023
82023
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques
S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu
Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021
72021
Cosa: Concatenated sample pretrained vision-language foundation model
S Chen, X He, H Li, X Jin, J Feng, J Liu
The Twelfth International Conference on Learning Representations, 2023
62023
GLOBER: coherent non-autoregressive video generation via global guided video decoder
M Sun, W Wang, Z Qin, J Sun, S Chen, J Liu
Advances in Neural Information Processing Systems 36, 2024
22024
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Y Yan, X He, W Wang, S Chen, J Liu
arXiv preprint arXiv:2308.09779, 2023
22023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Z Liu, S Chen, L Guo, H Li, X He, J Liu
Proceedings of the 31st ACM International Conference on Multimedia, 5120-5131, 2023
12023
Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Y Yan, X He, S Chen, S Lu, J Liu
International Conference on Intelligent Computing, 313-324, 2024
2024
Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation
Y Yan, X He, S Chen, J Liu
Proceedings of the 2024 International Conference on Multimedia Retrieval …, 2024
2024
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–15