Theo dõi
Erik Jenner
Erik Jenner
Google DeepMind
Email được xác minh tại google.com - Trang chủ
Tiêu đề
Trích dẫn bởi
Trích dẫn bởi
Năm
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
TMLR, 2024
163*2024
imitation: Clean imitation learning implementations
A Gleave, M Taufeeque, J Rocamonde, E Jenner, SH Wang, S Toyer, ...
arXiv preprint arXiv:2211.11972, 2022
682022
Steerable Partial Differential Operators for Equivariant Neural Networks
E Jenner, M Weiler
ICLR, 2022
312022
Preprocessing Reward Functions for Interpretability
E Jenner, A Gleave
NeurIPS Cooperative AI workshop, 2021
132021
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
E Jenner, S Kapur, V Georgiev, C Allen, S Emmons, S Russell
NeurIPS, 2024
82024
When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning
L Lang, D Foote, S Russell, A Dragan, E Jenner, S Emmons
NeurIPS, 2024
8*2024
STARC: A General Framework For Quantifying Differences Between Reward Functions
J Skalse, L Farnik, SR Motwani, E Jenner, A Gleave, A Abate
ICLR, 2023
72023
Calculus on MDPs: Potential shaping as a gradient
E Jenner, H van Hoof, A Gleave
arXiv preprint arXiv:2208.09570, 2022
7*2022
A general framework for reward function distances
E Jenner, JMV Skalse, A Gleave
NeurIPS ML Safety Workshop, 2022
52022
A comparison of causal scrubbing, causal abstractions, and related methods
E Jenner, A Garriga-alonso, E Zverev
AI Alignment Forum, 2023
42023
Diffusion on syntax trees for program synthesis
S Kapur, E Jenner, S Russell
arXiv preprint arXiv:2405.20519, 2024
32024
Obfuscated Activations Bypass LLM Latent-Space Defenses
L Bailey, A Serrano, A Sheshadri, M Seleznyov, J Taylor, E Jenner, ...
arXiv preprint arXiv:2412.09565, 2024
2024
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
E Jenner, EF Sanmartín, FA Hamprecht
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021
2021
AI Can Conceal Undesirable Outputs Even Under White-Box Inspection
A Draguns, E Jenner
Replication: Fairness without demographics through Adversarially Reweighted Learning
E Jenner, T Lieberum, FP Nolte, N Rutsch
Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.
Bài viết 1–15