Learning audio-visual speech representation by masked multimodal cluster prediction B Shi, WN Hsu, K Lakhotia, A Mohamed arXiv preprint arXiv:2201.02184, 2022 | 293 | 2022 |
Scaling speech technology to 1,000+ languages V Pratap, A Tjandra, B Shi, P Tomasello, A Babu, S Kundu, A Elkahky, ... Journal of Machine Learning Research 25 (97), 1-52, 2024 | 245 | 2024 |
Voicebox: Text-guided multilingual universal speech generation at scale M Le, A Vyas, B Shi, B Karrer, L Sari, R Moritz, M Williamson, V Manohar, ... Advances in neural information processing systems 36, 2024 | 206 | 2024 |
Robust self-supervised audio-visual speech recognition B Shi, WN Hsu, A Mohamed arXiv preprint arXiv:2201.01763, 2022 | 118 | 2022 |
Scaling autoregressive multi-modal models: Pretraining and instruction tuning L Yu, B Shi, R Pasunuru, B Muller, O Golovneva, T Wang, A Babu, B Tang, ... arXiv preprint arXiv:2309.02591 2 (3), 2023 | 112 | 2023 |
Comparative layer-wise analysis of self-supervised speech models A Pasad, B Shi, K Livescu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 98 | 2023 |
American sign language fingerspelling recognition in the wild B Shi, AM Del Rio, J Keane, J Michaux, D Brentari, G Shakhnarovich, ... 2018 IEEE Spoken Language Technology Workshop (SLT), 145-152, 2018 | 90 | 2018 |
Offloading guidelines for augmented reality applications on wearable devices B Shi, J Yang, Z Huang, P Hui Proceedings of the 23rd ACM international conference on Multimedia, 1271-1274, 2015 | 87 | 2015 |
Fingerspelling recognition in the wild with iterative visual attention B Shi, AMD Rio, J Keane, D Brentari, G Shakhnarovich, K Livescu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019 | 81 | 2019 |
Few-shot acoustic event detection via meta learning B Shi, M Sun, KC Puvvada, CC Kao, S Matsoukas, C Wang ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 72 | 2020 |
Audiobox: Unified audio generation with natural language prompts A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo, J Zhang, X Zhang, ... arXiv preprint arXiv:2312.15821, 2023 | 68 | 2023 |
Open-domain sign language translation learned from online video B Shi, D Brentari, G Shakhnarovich, K Livescu arXiv preprint arXiv:2205.12870, 2022 | 43 | 2022 |
A cross-task analysis of text span representations S Toshniwal, H Shi, B Shi, L Gao, K Livescu, K Gimpel arXiv preprint arXiv:2006.03866, 2020 | 43 | 2020 |
Expresso: A benchmark and analysis of discrete expressive speech resynthesis TA Nguyen, WN Hsu, A d'Avirro, B Shi, I Gat, M Fazel-Zarani, T Remez, ... arXiv preprint arXiv:2308.05725, 2023 | 39 | 2023 |
u-hubert: Unified mixed-modal speech pretraining and zero-shot transfer to unlabeled modality WN Hsu, B Shi Advances in Neural Information Processing Systems 35, 21157-21170, 2022 | 34 | 2022 |
Fingerspelling detection in american sign language B Shi, D Brentari, G Shakhnarovich, K Livescu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 31 | 2021 |
Muavic: A multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation M Anwar, B Shi, V Goswami, WN Hsu, J Pino, C Wang arXiv preprint arXiv:2303.00628, 2023 | 29 | 2023 |
Semi-supervised acoustic event detection based on tri-training B Shi, M Sun, CC Kao, V Rozgic, S Matsoukas, C Wang ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 24 | 2019 |
Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition B Shi, K Livescu 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017 | 22 | 2017 |
Compression of acoustic event detection models with low-rank matrix factorization and quantization training B Shi, M Sun, CC Kao, V Rozgic, S Matsoukas, C Wang arXiv preprint arXiv:1905.00855, 2019 | 18 | 2019 |