Improving prosody with linguistic and bert derived features in multi-speaker based mandarin chinese neural tts Y Xiao, L He, H Ming, FK Soong ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 63 | 2020 |
Paired phone-posteriors approach to ESL pronunciation quality assessment Y Xiao, FK Soong, W Hu bdl 1 (782d), 3, 2018 | 20 | 2018 |
Prosodyspeech: Towards advanced prosody model for neural text-to-speech Y Yi, L He, S Pan, X Wang, Y Xiao ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 13 | 2022 |
Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference. Y Xiao, FK Soong INTERSPEECH, 1755-1759, 2017 | 12 | 2017 |
Unistyle: Unified style modeling for speaking style captioning and stylistic speech synthesis X Zhu, W Tian, X Wang, L He, Y Xiao, X Wang, X Tan, S Zhao, L Xie Proceedings of the 32nd ACM International Conference on Multimedia, 7513-7522, 2024 | 8 | 2024 |
Contextspeech: Expressive and efficient text-to-speech for paragraph reading Y Xiao, S Zhang, X Wang, X Tan, L He, S Zhao, FK Soong, T Lee arXiv preprint arXiv:2307.00782, 2023 | 8 | 2023 |
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning H Guo, F Xie, J Kang, Y Xiao, X Wu, H Meng IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 3 | 2024 |
Contrastive context-speech pretraining for expressive text-to-speech synthesis Y Xiao, X Wang, X Tan, L He, X Zhu, S Zhao, T Lee Proceedings of the 32nd ACM International Conference on Multimedia, 2099-2107, 2024 | 2 | 2024 |
Improving fastspeech tts with efficient self-attention and compact feed-forward network Y Xiao, X Wang, L He, FK Soong ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 2 | 2022 |
PodAgent: A Comprehensive Framework for Podcast Generation Y Xiao, L He, H Guo, F Xie, T Lee arXiv preprint arXiv:2503.00455, 2025 | | 2025 |
Audio-FLAN: A Preliminary Release L Xue, Z Zhou, J Pan, Z Li, S Fan, Y Ma, S Cheng, D Yang, H Guo, Y Xiao, ... arXiv preprint arXiv:2502.16584, 2025 | | 2025 |
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training X Zhu, L He, Y Xiao, X Wang, X Tan, S Zhao, L Xie arXiv preprint arXiv:2501.04416, 2025 | | 2025 |