Ast: Audio spectrogram transformer Y Gong, YA Chung, J Glass arXiv preprint arXiv:2104.01778, 2021 | 979 | 2021 |
An unsupervised autoregressive model for speech representation learning YA Chung, WN Hsu, H Tang, J Glass arXiv preprint arXiv:1904.03240, 2019 | 460 | 2019 |
W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training YA Chung, Y Zhang, W Han, CC Chiu, J Qin, R Pang, Y Wu 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2021 | 415 | 2021 |
Ssast: Self-supervised audio spectrogram transformer Y Gong, CI Lai, YA Chung, J Glass Proceedings of the AAAI Conference on Artificial Intelligence 36 (10), 10699 …, 2022 | 291 | 2022 |
Generative pre-training for speech with autoregressive predictive coding YA Chung, J Glass ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 221 | 2020 |
Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder YA Chung, CC Wu, CH Shen, HY Lee, LS Lee arXiv preprint arXiv:1603.00982, 2016 | 220 | 2016 |
Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech YA Chung, J Glass arXiv preprint arXiv:1803.08976, 2018 | 219 | 2018 |
Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation Y Gong, YA Chung, J Glass IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3292-3306, 2021 | 175 | 2021 |
Semi-supervised training for improving data efficiency in end-to-end speech synthesis YA Chung, Y Wang, WN Hsu, Y Zhang, RJ Skerry-Ryan ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 142 | 2019 |
Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization WN Hsu, Y Zhang, RJ Weiss, YA Chung, Y Wang, Y Wu, J Glass ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 130 | 2019 |
Vector-quantized autoregressive predictive coding YA Chung, H Tang, J Glass arXiv preprint arXiv:2005.08392, 2020 | 126 | 2020 |
Unsupervised cross-modal alignment of speech and text embedding spaces YA Chung, WH Weng, S Tong, J Glass Advances in neural information processing systems 31, 2018 | 113 | 2018 |
Cost-aware pre-training for multiclass cost-sensitive deep learning YA Chung, HT Lin, SW Yang arXiv preprint arXiv:1511.09337, 2015 | 111 | 2015 |
Supervised and unsupervised transfer learning for question answering YA Chung, HY Lee, J Glass arXiv preprint arXiv:1711.05345, 2017 | 108 | 2017 |
Non-autoregressive predictive coding for learning speech representations from local dependencies AH Liu, YA Chung, J Glass arXiv preprint arXiv:2011.00406, 2020 | 100 | 2020 |
Learning deep representations of medical images using siamese cnns with application to content-based image retrieval YA Chung, WH Weng arXiv preprint arXiv:1711.08490, 2017 | 93 | 2017 |
SLAM: A unified encoder for speech and language modeling via speech-text joint pre-training A Bapna, Y Chung, N Wu, A Gulati, Y Jia, JH Clark, M Johnson, J Riesa, ... arXiv preprint arXiv:2110.10329, 2021 | 91 | 2021 |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, PA Duquenne, ... arXiv preprint arXiv:2308.11596, 2023 | 88 | 2023 |
Seamless: Multilingual Expressive and Streaming Speech Translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, M Duppenthaler, ... arXiv preprint arXiv:2312.05187, 2023 | 81 | 2023 |
Splat: Speech-language joint pre-training for spoken language understanding YA Chung, C Zhu, M Zeng arXiv preprint arXiv:2010.02295, 2020 | 79 | 2020 |