Deep speech 2: End-to-end speech recognition in english and mandarin D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ... International conference on machine learning, 173-182, 2016 | 3836 | 2016 |
librosa: Audio and music signal analysis in python. B McFee, C Raffel, D Liang, DPW Ellis, M McVicar, E Battenberg, O Nieto SciPy, 18-24, 2015 | 3259 | 2015 |
Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis Y Wang, D Stanton, Y Zhang, RJS Ryan, E Battenberg, J Shor, Y Xiao, ... International conference on machine learning, 5180-5189, 2018 | 986 | 2018 |
Towards end-to-end prosody transfer for expressive speech synthesis with tacotron RJ Skerry-Ryan, E Battenberg, Y Xiao, Y Wang, D Stanton, J Shor, ... international conference on machine learning, 4693-4702, 2018 | 697 | 2018 |
Lasagne: first release S Dieleman, J Schlüter, C Raffel, E Olson, SK Sønderby, D Nouri, ... Zenodo: Geneva, Switzerland 3, 74, 2015 | 476* | 2015 |
Exploring neural transducers for end-to-end speech recognition E Battenberg, J Chen, R Child, A Coates, YGY Li, H Liu, S Satheesh, ... 2017 IEEE automatic speech recognition and understanding workshop (ASRU …, 2017 | 279* | 2017 |
Location-relative attention mechanisms for robust long-form speech synthesis E Battenberg, RJ Skerry-Ryan, S Mariooryad, D Stanton, D Kao, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 132 | 2020 |
Wave-tacotron: Spectrogram-free end-to-end text-to-speech synthesis RJ Weiss, RJ Skerry-Ryan, E Battenberg, S Mariooryad, DP Kingma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 123 | 2021 |
librosa/librosa: 0.8. 0 B McFee, V Lostanlen, A Metsai, M McVicar, S Balke, C Thomé, C Raffel, ... Version 0.8. 0, Zenodo, doi 10, 2020 | 108 | 2020 |
Uncovering latent style factors for expressive speech synthesis Y Wang, RJ Skerry-Ryan, Y Xiao, D Stanton, J Shor, E Battenberg, ... arXiv preprint arXiv:1711.00520, 2017 | 88 | 2017 |
Semi-supervised generative modeling for controllable speech synthesis R Habib, S Mariooryad, M Shannon, E Battenberg, RJ Skerry-Ryan, ... arXiv preprint arXiv:1910.01709, 2019 | 61 | 2019 |
viktorandreevichmorozov, K B McFee, A Metsai, M McVicar, S Balke, C Thomé, C Raffel, F Zalkow, ... Moore, R. Bittner, S. Hidaka, Z. Wei, nullmightybofo, D. Herenú, F.-R …, 2020 | 58 | 2020 |
Effective use of variational embedding capacity in expressive end-to-end speech synthesis E Battenberg, S Mariooryad, D Stanton, RJ Skerry-Ryan, M Shannon, ... arXiv preprint arXiv:1906.03402, 2019 | 58 | 2019 |
Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures. E Battenberg, D Wessel ISMIR 9, 501-506, 2009 | 57 | 2009 |
librosa 0.5. 0 B McFee, M McVicar, O Nieto, S Balke, C Thome, D Liang, E Battenberg, ... Zenodo. URL: https://doi. org/10 5281, 2017 | 55 | 2017 |
Implementing real-time partitioned convolution algorithms on conventional operating systems E Battenberg, R Avizienis Proceedings of the 14th International Conference on Digital Audio Effects …, 2011 | 51 | 2011 |
librosa: 0.4. 1 B McFee, M McVicar, C Raffel, D Liang, O Nieto, E Battenberg, J Moore, ... Zenodo, 2015 | 48 | 2015 |
Analyzing Drum Patterns Using Conditional Deep Belief Networks. E Battenberg, D Wessel ISMIR, 37-42, 2012 | 46 | 2012 |
librosa/librosa: 0.7. 2 B McFee, V Lostanlen, M McVicar, A Metsai, S Balke, C Thomé, C Raffel, ... Zenodo, Jan 13, 2020 | 44 | 2020 |
Speaker generation D Stanton, M Shannon, S Mariooryad, RJ Skerry-Ryan, E Battenberg, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 34 | 2022 |