Obserwuj
Yifei Xin
Yifei Xin
Zweryfikowany adres z stu.pku.edu.cn
Tytuł
Cytowane przez
Cytowane przez
Rok
Videollama 2: Advancing spatial-temporal modeling and audio understanding in video-llms
Z Cheng, S Leng, H Zhang, Y Xin, X Li, G Chen, Y Zhu, W Zhang, Z Luo, ...
arXiv preprint arXiv:2406.07476, 2024
1672024
Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss
Y Xin, D Yang, Y Zou
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
362023
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.
Y Xin, D Yang, Y Zou
INTERSPEECH, 1546-1550, 2022
172022
Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
L Li, W Xu, J Guo, R Zhao, X Li, Y Yuan, B Zhang, Y Jiang, Y Xin, R Dang, ...
arXiv preprint arXiv:2410.13185, 2024
92024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Y Xin, X Peng, Y Lu
Proc. INTERSPEECH 2023, 2763-2767, 2024
92024
Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions
Y Xin, Y Zou
Proc. INTERSPEECH 2023, 341-345, 2023
92023
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Y Zhu, B Li, Y Xin, L Xu
arXiv preprint arXiv:2411.02038, 2024
82024
Cooperative game modeling with weighted token-level alignment for audio-text retrieval
Y Xin, B Wang, L Shang
IEEE Signal Processing Letters 30, 1317-1321, 2023
72023
Improving weakly supervised sound event detection with causal intervention
Y Xin, D Yang, F Cui, Y Wang, Y Zou
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
72023
Videollama 2: Advancing spatial-temporal modeling and audio understanding in video-llms, 2024
Z Cheng, S Leng, H Zhang, Y Xin, X Li, G Chen, Y Zhu, W Zhang, Z Luo, ...
URL https://arxiv. org/abs/2406.07476 9, 0
7
Low-complexity acoustic scene classification with mismatch-devices using separable convolutions and coordinate attention
Y Xin, Y Zou, F Cui, Y Wang
DCASE2022 Challenge, Tech. Rep, 2022
62022
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Y Xin, X Cheng, Z Zhu, X Yang, Y Zou
Proc. Interspeech 2024, 1670-1674, 2024
52024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
H Zhao, Y Xin, Z Yu, B Zhu, L Lu, Z Ma
Proc. Interspeech 2024, 52-56, 2024
5*2024
Soul-mix: Enhancing multimodal machine translation with manifold mixup
X Cheng, Z Yao, Y Xin, H An, H Li, Y Li, Y Zou
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
52024
Improving speech enhancement via event-based query
Y Xin, X Peng, Y Lu
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
52023
Background-aware modeling for weakly supervised sound event detection
Y Xin, D Yang, Y Zou
Proc. ISCA Annu. Conf. Int. Speech Commun. Assoc, 1199-1203, 2023
52023
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
Y Xin, Z Zhu, X Cheng, X Yang, Y Zou
Proc. Interspeech 2024, 1140-1144, 2024
12024
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
R Dang, Y Yuan, W Zhang, Y Xin, B Zhang, L Li, L Wang, Q Zeng, X Li, ...
arXiv preprint arXiv:2501.05031, 2025
2025
Chain of Ideas: Revolutionizing Research in Idea Development with LLM Agents
L Li, W Xu, J Guo, R Zhao, X Li, Y Yuan, B Zhang, Y Jiang, Y Xin, R Dang, ...
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–19