Theo dõi
Micah Carroll
Micah Carroll
PhD student, UC Berkeley
Email được xác minh tại berkeley.edu - Trang chủ
Tiêu đề
Trích dẫn bởi
Trích dẫn bởi
Năm
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
4952023
On the Utility of Learning About Humans for Human-AI Coordination
M Carroll, R Shah, MK Ho, T Griffiths, S Seshia, P Abbeel, A Dragan
Advances in Neural Information Processing Systems, 2019, 5174-5185, 2019
4762019
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
126*2023
Estimating and Penalizing Induced Preference Shifts in Recommender Systems
M Carroll, A Dragan, S Russell, D Hadfield-Menell
International Conference on Machine Learning, 2022 (Spotlight), 2686-2708, 2022
79*2022
Characterizing Manipulation from AI Systems
M Carroll*, A Chan*, H Ashton, D Krueger
EEAMO 2023, 2023
652023
Engagement, user satisfaction, and the amplification of divisive content on social media
S Milli, M Carroll, Y Wang, S Pandey, S Zhao, AD Dragan
arXiv preprint arXiv:2305.16941, 2023
52*2023
Uni[MASK]: Unified inference in sequential decision problems
M Carroll, O Paradise, J Lin, R Georgescu, M Sun, D Bignell, S Milani, ...
NeurIPS 2022 (Oral), 2022
42*2022
Evaluating the Robustness of Collaborative Agents
P Knott, M Carroll, S Devlin, K Ciosek, K Hofmann, AD Dragan, R Shah
AAMAS 2021 (Extended Abstract), 2021
332021
Beyond preferences in ai alignment
T Zhi-Xuan, M Carroll, M Franklin, H Ashton
Philosophical Studies, 1-51, 2024
172024
Ai alignment with changing and influenceable reward functions
M Carroll, D Foote, A Siththaranjan, S Russell, A Dragan
arXiv preprint arXiv:2405.17713, 2024
172024
Humanity's Last Exam
L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, S Shi, M Choi, A Agrawal, ...
arXiv preprint arXiv:2501.14249, 2025
122025
Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration
M Yang, M Carroll, A Dragan
NeurIPS 2022 Human in the Loop Learning (HiLL) Workshop, 2022
102022
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
M Williams*, M Carroll*, A Narang, C Weisser, B Murphy, A Dragan
arXiv preprint arXiv:2411.02306, 2024
7*2024
Who Needs to Know? Minimal Knowledge for Optimal Coordination
N Lauffer, A Shah, M Carroll, MD Dennis, S Russell
International Conference on Machine Learning 2023, 18599-18613, 2023
52023
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
D Zhang, M Carroll, A Bobu, A Dragan
NeurIPS 2022 Human in the Loop Learning (HiLL) Workshop, 2022
52022
Overview of current AI alignment approaches
M Carroll
32018
Truthfulness Without Supervision: Model Evaluation Using Peer Prediction
T Qiu, M Carroll, C Allen
Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.
Bài viết 1–17