Haohan Guo

Cytowane przez

	Wszystkie	Od 2020
Cytowania	569	557
h-indeks	13	12
i10-indeks	14	14

280

140

210

20192020202120222023202420258 22 40 44 77 266 107

Dostęp publiczny

Wyświetl wszystko

1 artykuł

0 artykułów

dostępne

niedostępne

Objęte finansowaniem

Współautorzy

Xixin WuThe Chinese University of Hong KongZweryfikowany adres z se.cuhk.edu.hk
Dongchao YangChinese University of Hong KongZweryfikowany adres z se.cuhk.edu.hk
Lei XieNorthwestern Polytechnical UniversityZweryfikowany adres z nwpu.edu.cn
Feng-Long xieXiaohongshuZweryfikowany adres z xiaohongshu.com
Lei HePrincipal Scientist Manager, MicrosoftZweryfikowany adres z microsoft.com
Shaofei ZhangSenior Software Engineer, MicrosoftZweryfikowany adres z microsoft.com
Jiawen KangThe Chinese University of Hong KongZweryfikowany adres z se.cuhk.edu.hk
Yujia XiaoThe Chinese University of Hong KongZweryfikowany adres z link.cuhk.edu.hk
Shan YangTencent AI LabZweryfikowany adres z nwpu-aslp.org
Dan SuTencent AI LabZweryfikowany adres z tencent.com
Chunlei ZhangSEED, Bytedance; Ex-Tencent AI LabZweryfikowany adres z bytedance.com
Dong Yu (俞栋)Distinguished Scientist @ Tencent AI Lab, ACM/IEEE/ISCA FellowZweryfikowany adres z global.tencent.com

Obserwuj

Haohan Guo

Chinese University of Hong Kong

Zweryfikowany adres z se.cuhk.edu.hk - Strona główna

Speech Synthesis Voice Conversion Speech Processing


Tytuł Sortuj wg cytatów Sortuj wg roku Sortuj wg tytułu	Cytowane przez Cytowane przez	Rok
UniAudio: Towards Universal Audio Generation with Large Language Models D Yang, J Tian, X Tan, R Huang, S Liu, H Guo, X Chang, J Shi, J Bian, ... Forty-first International Conference on Machine Learning, 2024	132*	2024
Conversational end-to-end tts for voice agents H Guo, S Zhang, FK Soong, L He, L Xie 2021 IEEE Spoken Language Technology Workshop (SLT), 403-409, 2021	80	2021
Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data M Łajszczak, G Cámbara, Y Li, F Beyhan, A Van Korlaar, F Yang, A Joly, ... arXiv preprint arXiv:2402.08093, 2024	70	2024
A new gan-based end-to-end tts training algorithm H Guo, FK Soong, L He, L Xie INTERSPEECH, 2019	61	2019
Exploiting syntactic features in a parsed tree to improve end-to-end TTS H Guo, FK Soong, L He, L Xie INTERSPEECH, 2019	40	2019
Single-codec: Single-codebook speech codec towards high-performance speech generation H Li, L Xue, H Guo, X Zhu, Y Lv, L Xie, Y Chen, H Yin, Z Li arXiv preprint arXiv:2406.07422, 2024	25	2024
Improving adversarial waveform generation based singing voice conversion with harmonic signals H Guo, Z Zhou, F Meng, K Liu ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	19	2022
Simplespeech: Towards simple and efficient text-to-speech with scalar latent transformer diffusion models D Yang, D Wang, H Guo, X Chen, X Wu, H Meng arXiv preprint arXiv:2406.02328, 2024	18	2024
Uniaudio 1.5: Large language model-driven audio codec is a few-shot audio task learner D Yang, H Guo, Y Wang, R Huang, X Li, X Tan, X Wu, H Meng arXiv preprint arXiv:2406.10056, 2024	15	2024
MSMC-TTS: Multi-stage multi-codebook VQ-VAE based neural TTS H Guo, F Xie, X Wu, FK Soong, H Meng IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 1811-1824, 2023	15	2023
Feature reinforcement with word embedding and parsing information in neural TTS H Ming, L He, H Guo, FK Soong arXiv preprint arXiv:1901.00707, 2019	15	2019
Fireredtts: A foundation text-to-speech framework for industry-level generative speech applications HH Guo, K Liu, FY Shen, YC Wu, FL Xie, K Xie, KT Xu arXiv preprint arXiv:2409.03283, 2024	14	2024
A multi-stage multi-codebook VQ-VAE approach to high-performance neural TTS H Guo, F Xie, FK Soong, X Wu, H Meng arXiv preprint arXiv:2209.10887, 2022	13	2022
Phonetic posteriorgrams based many-to-many singing voice conversion via adversarial training H Guo, H Lu, N Hu, C Zhang, S Yang, L Xie, D Su, D Yu arXiv preprint arXiv:2012.01837, 2020	11	2020
Simplespeech 2: Towards simple and efficient text-to-speech with flow-based scalar latent transformer diffusion models D Yang, R Huang, Y Wang, H Guo, D Chong, S Liu, X Wu, H Meng arXiv preprint arXiv:2408.13893, 2024	7	2024
A multi-scale time-frequency spectrogram discriminator for GAN-based non-autoregressive TTS H Guo, H Lu, X Wu, H Meng arXiv preprint arXiv:2203.01080, 2022	7	2022
Socodec: A semantic-ordered multi-stream speech codec for efficient language model based text-to-speech synthesis H Guo, F Xie, K Xie, D Yang, D Guo, X Wu, H Meng 2024 IEEE Spoken Language Technology Workshop (SLT), 645-651, 2024	5	2024
Addressing index collapse of large-codebook speech tokenizer with dual-decoding product-quantized variational auto-encoder H Guo, F Xie, D Yang, H Lu, X Wu, H Meng 2024 IEEE Spoken Language Technology Workshop (SLT), 548-553, 2024	5	2024
Towards high-quality neural TTS for low-resource languages by learning compact speech representations H Guo, F Xie, X Wu, H Lu, H Meng arXiv preprint arXiv:2210.15131, 2022	5	2022
Cross-speaker encoding network for multi-talker speech recognition J Kang, L Meng, M Cui, H Guo, X Wu, X Liu, H Meng ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024	4	2024

Nie można teraz wykonać tej operacji. Spróbuj ponownie później.

Prace 1–20

Cytowania rocznie

Powielone cytowania

Scalone cytowania

Dodaj współautorówWspółautorzy

Obserwuj

Cytowane przez

Współautorzy