Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2070 | 2023 |
ESPnet: End-to-end speech processing toolkit S Watanabe, T Hori, S Karita, T Hayashi, J Nishitoba, Y Unno, NEY Soplin, ... arXiv preprint arXiv:1804.00015, 2018 | 1676 | 2018 |
A comparative study on transformer vs rnn in speech applications S Karita, N Chen, T Hayashi, T Hori, H Inaguma, Z Jiang, M Someki, ... 2019 IEEE automatic speech recognition and understanding workshop (ASRU …, 2019 | 859 | 2019 |
WaveGrad: Estimating gradients for waveform generation N Chen, Y Zhang, H Zen, RJ Weiss, M Norouzi, W Chan International Conference on Learning Representations, 2021 | 791 | 2021 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 614 | 2024 |
Google usm: Scaling automatic speech recognition beyond 100 languages Y Zhang, W Han, J Qin, Y Wang, A Bapna, Z Chen, N Chen, B Li, ... arXiv preprint arXiv:2303.01037, 2023 | 255 | 2023 |
Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings E Cooper, CI Lai, Y Yasuda, F Fang, X Wang, N Chen, J Yamagishi ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 214 | 2020 |
Deep feature for text-dependent speaker verification Y Liu, Y Qian, N Chen, T Fu, Y Zhang, K Yu Speech Communication 73, 1-13, 2015 | 214 | 2015 |
ASSERT: Anti-spoofing with squeeze-excitation and residual networks CI Lai, N Chen, J Villalba, N Dehak arXiv preprint arXiv:1904.01120, 2019 | 199 | 2019 |
Noise2music: Text-conditioned music generation with diffusion models Q Huang, DS Park, T Wang, TI Denk, A Ly, N Chen, Z Zhang, Z Zhang, ... arXiv preprint arXiv:2302.03917, 2023 | 163 | 2023 |
State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations J Villalba, N Chen, D Snyder, D Garcia-Romero, A McCree, G Sell, ... Computer Speech & Language 60, 101026, 2020 | 150 | 2020 |
x-vectors meet emotions: A study on dependencies between emotion and speaker recognition R Pappagari, T Wang, J Villalba, N Chen, N Dehak ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 149 | 2020 |
Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict Y Higuchi, S Watanabe, N Chen, T Ogawa, T Kobayashi arXiv preprint arXiv:2005.08700, 2020 | 145 | 2020 |
Non-autoregressive transformer for speech recognition N Chen, S Watanabe, J Villalba, P Żelasko, N Dehak IEEE Signal Processing Letters 28, 121-125, 2020 | 143 | 2020 |
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18. J Villalba, N Chen, D Snyder, D Garcia-Romero, A McCree, G Sell, ... Interspeech, 1488-1492, 2019 | 125 | 2019 |
Multi-task learning for text-dependent speaker verification N Chen, Y Qian, K Yu Proc. 16th Annual Conference of the International Speech Communication …, 2015 | 123 | 2015 |
Robust deep feature for spoofing detection—The SJTU system for ASVspoof 2015 challenge N Chen, Y Qian, H Dinkel, B Chen, K Yu Sixteenth annual conference of the international speech communication …, 2015 | 108 | 2015 |
Age estimation in short speech utterances based on LSTM recurrent neural networks R Zazo, PS Nidadavolu, N Chen, J Gonzalez-Rodriguez, N Dehak IEEE Access 6, 22524-22530, 2018 | 103 | 2018 |
Overview of BTAS 2016 speaker anti-spoofing competition P Korshunov, S Marcel, H Muckenhirn, AR Gonçalves, AGS Mello, ... 2016 IEEE 8th international conference on biometrics theory, applications …, 2016 | 101 | 2016 |
End-to-end spoofing detection with raw waveform CLDNNS H Dinkel, N Chen, Y Qian, K Yu 2017 IEEE international conference on acoustics, speech and signal …, 2017 | 93 | 2017 |