Poserac: Pose saliency transformer for repetitive action counting Z Yao, X Cheng, Y Zou arXiv preprint arXiv:2303.08450, 2023 | 20 | 2023 |
FC-MTLF: A fine-and coarse-grained multi-task learning framework for cross-lingual spoken language understanding X Cheng, W Xu, Z Yao, Z Zhu, Y Li, H Li, Y Zou Proc. of Interspeech 2, 2023 | 18 | 2023 |
GhostT5: generate more features with cheap operations to improve textless spoken question answering X Cheng, Z Zhu, Z Yao, H Li, Y Li, Y Zou Proc. INTERSPEECH 2023, 1134-1138, 2023 | 12 | 2023 |
C2A-SLU: cross and contrastive attention for improving ASR robustness in spoken language understanding X Cheng, Z Yao, Z Zhu, Y Li, H Li, Y Zou Proc. of INTERSPEECH, 2023 | 11 | 2023 |
Car: Controllable autoregressive modeling for visual generation Z Yao, J Li, Y Zhou, Y Liu, X Jiang, C Wang, F Zheng, Y Zou, L Li arXiv preprint arXiv:2410.04671, 2024 | 7 | 2024 |
Soul-mix: Enhancing multimodal machine translation with manifold mixup X Cheng, Z Yao, Y Xin, H An, H Li, Y Li, Y Zou Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 5 | 2024 |
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model Z Yao, X Cheng, Z Huang Proceedings of the 32nd ACM International Conference on Multimedia, 3411-3420, 2024 | 2 | 2024 |
Recovering Global Data Distribution Locally in Federated Learning Z Yao BMVC 2024, 2024 | | 2024 |