Прати
Rafael Mitkov Rafailov
Rafael Mitkov Rafailov
Graduate Student, Stanford University
Верификована је имејл адреса на stanford.edu - Почетна страница
Наслов
Навело
Навело
Година
Direct preference optimization: Your language model is secretly a reward model
R Rafailov, A Sharma, E Mitchell, CD Manning, S Ermon, C Finn
Advances in Neural Information Processing Systems 36, 53728-53741, 2023
28032023
Open x-embodiment: Robotic learning datasets and rt-x models
JJ Lim
IEEE International Conference on Robotics and Automation, 2024
511*2024
Combo: Conservative offline model-based policy optimization
T Yu, A Kumar, R Rafailov, A Rajeswaran, S Levine, C Finn
Advances in neural information processing systems 34, 28954-28967, 2021
4722021
Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback
K Tian, E Mitchell, A Zhou, A Sharma, R Rafailov, H Yao, C Finn, ...
arXiv preprint arXiv:2305.14975, 2023
2662023
Openvla: An open-source vision-language-action model
MJ Kim, K Pertsch, S Karamcheti, T Xiao, A Balakrishna, S Nair, ...
arXiv preprint arXiv:2406.09246, 2024
2552024
Diffusion model alignment using direct preference optimization
B Wallace, M Dang, R Rafailov, L Zhou, A Lou, S Purushwalkam, S Ermon, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
1572024
Offline reinforcement learning from images with latent space models
R Rafailov, T Yu, A Rajeswaran, C Finn
Learning for dynamics and control, 1154-1168, 2021
1472021
Offline meta-reinforcement learning with advantage weighting
E Mitchell, R Rafailov, XB Peng, S Levine, C Finn
International Conference on Machine Learning, 7780-7791, 2021
1272021
Disentangling length from quality in direct preference optimization
R Park, R Rafailov, S Ermon, C Finn
arXiv preprint arXiv:2403.19159, 2024
922024
Direct preference optimization: Your language model is secretly a reward model, 2024
R Rafailov, A Sharma, E Mitchell, S Ermon, CD Manning, C Finn
URL https://arxiv. org/abs/2305.18290 2305, 2023
922023
From to : Your Language Model is Secretly a Q-Function
R Rafailov, J Hejna, R Park, C Finn
arXiv preprint arXiv:2404.12358, 2024
912024
Preference fine-tuning of llms should leverage suboptimal, on-policy data
F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ...
arXiv preprint arXiv:2404.14367, 2024
732024
Aligning modalities in vision large language models via preference fine-tuning
Y Zhou, C Cui, R Rafailov, C Finn, H Yao
arXiv preprint arXiv:2402.11411, 2024
722024
Contrastive preference learning: learning from human feedback without rl
J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum, WB Knox, D Sadigh
arXiv preprint arXiv:2310.13639, 2023
542023
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data
M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ...
arXiv preprint arXiv:2404.01413, 2024
522024
Agent q: Advanced reasoning and learning for autonomous ai agents
P Putta, E Mills, N Garg, S Motwani, C Finn, D Garg, R Rafailov
arXiv preprint arXiv:2408.07199, 2024
482024
Visual adversarial imitation learning using variational models
R Rafailov, T Yu, A Rajeswaran, C Finn
Advances in Neural Information Processing Systems 34, 3016-3028, 2021
482021
An emulator for fine-tuning large language models using small language models
E Mitchell, R Rafailov, A Sharma, C Finn, CD Manning
arXiv preprint arXiv:2310.12962, 2023
432023
Vision-Based Manipulators Need to Also See from Their Hands.
K Hsu, MJ Kim, R Rafailov, J Wu, C Finn
ICLR, 2022
432022
Scaling laws for reward model overoptimization in direct alignment algorithms
R Rafailov, Y Chittepu, R Park, HS Sikchi, J Hejna, B Knox, C Finn, ...
Advances in Neural Information Processing Systems 37, 126207-126242, 2024
422024
Систем тренутно не може да изврши ову радњу. Пробајте поново касније.
Чланци 1–20