BLVD: Building a large-scale 5D semantics benchmark for autonomous driving J Xue, J Fang, T Li, B Zhang, P Zhang, Z Ye, J Dou 2019 International Conference on Robotics and Automation (ICRA), 6685-6691, 2019 | 66 | 2019 |
Comospeech: One-step speech and singing voice synthesis via consistency model Z Ye, W Xue, X Tan, J Chen, Q Liu, Y Guo Proceedings of the 31st ACM International Conference on Multimedia, 1831-1839, 2023 | 28 | 2023 |
FlashSpeech: Efficient Zero-Shot Speech Synthesis Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, W Xue, ... arXiv preprint arXiv:2404.14700, 2024 | 7 | 2024 |
CoMoSVC: Consistency Model-based Singing Voice Conversion Y Lu, Z Ye, W Xue, X Tan, Q Liu, Y Guo arXiv preprint arXiv:2401.01792, 2024 | 6 | 2024 |
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models S Wang, H Lin, Z Luo, Z Ye, G Chen, J Ma arXiv preprint arXiv:2406.11288, 2024 | 5 | 2024 |
NAS-FM: neural architecture search for tunable and interpretable sound synthesis based on frequency modulation Z Ye, W Xue, X Tan, Q Liu, Y Guo arXiv preprint arXiv:2305.12868, 2023 | 4 | 2023 |
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Z Ye, P Sun, J Lei, H Lin, X Tan, Z Dai, Q Kong, J Chen, J Pan, Q Liu, ... arXiv preprint arXiv:2408.17175, 2024 | 2 | 2024 |
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation J Chen, W Xue, X Tan, Z Ye, Q Liu, Y Guo arXiv preprint arXiv:2405.07682, 2024 | 2 | 2024 |
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation P Sun, S Cheng, X Li, Z Ye, H Liu, H Zhang, W Xue, Y Guo arXiv preprint arXiv:2410.10676, 2024 | 1 | 2024 |
PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain J Chen, Z Dai, Z Ye, X Tan, Q Liu, Y Guo, W Xue Findings of the Association for Computational Linguistics: EMNLP 2024, 4253-4263, 2024 | | 2024 |