Follow
Josef Dai
Josef Dai
Other namesJuntao Dai
Verified email at zju.edu.cn
Title
Cited by
Cited by
Year
Baichuan 2: Open large-scale language models
A Yang, B Xiao, B Wang, B Zhang, C Bian, C Yin, C Lv, D Pan, D Wang, ...
arXiv preprint arXiv:2309.10305, 2023
445*2023
Beavertails: Towards improved safety alignment of llm via a human-preference dataset
J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ...
Advances in Neural Information Processing Systems 36, 2024
2512024
Safe rlhf: Safe reinforcement learning from human feedback
J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang, Y Yang
The Twelfth International Conference on Learning Representations (Spotlight), 2024
1852024
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
1772023
Constrained update projection approach to safe policy optimization
L Yang, J Ji, J Dai, L Zhang, B Zhou, P Li, Y Yang, G Pan
Advances in Neural Information Processing Systems 35, 9111-9124, 2022
61*2022
Safety gymnasium: A unified safe reinforcement learning benchmark
J Ji, B Zhang, J Zhou, X Pan, W Huang, R Sun, Y Geng, Y Zhong, J Dai, ...
Advances in Neural Information Processing Systems 36, 2023
59*2023
Aligner: Achieving efficient alignment through weak-to-strong correction
J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, Y Yang
arXiv preprint arXiv:2402.02416, 2024
402024
Omnisafe: An infrastructure for accelerating safe reinforcement learning research
J Ji, J Zhou, B Zhang, J Dai, X Pan, R Sun, W Huang, Y Geng, M Liu, ...
Journal of Machine Learning Research 25 (285), 1-6, 2024
342024
Augmented proximal policy optimization for safe reinforcement learning
J Dai, J Ji, L Yang, Q Zheng, G Pan
Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 7288-7295, 2023
132023
Pku-saferlhf: A safety alignment preference dataset for llama family models
J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang
arXiv e-prints, arXiv: 2406.15513, 2024
92024
Reward Generalization in RLHF: A Topological Perspective
T Qiu, F Zeng, J Ji, D Yan, K Wang, J Zhou, Y Han, J Dai, X Pan, Y Yang
arXiv preprint arXiv:2402.10184, 2024
42024
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
J Dai, T Chen, X Wang, Z Yang, T Chen, J Ji, Y Yang
arXiv preprint arXiv:2406.14477, 2024
12024
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang
arXiv preprint arXiv:2406.15513, 2024
2024
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
J Dai, Y Yang, Q Zheng, G Pan
Forty-first International Conference on Machine Learning, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–14