متابعة
Alex Tamkin
Alex Tamkin
Research Scientist, Anthropic
بريد إلكتروني تم التحقق منه على cs.stanford.edu - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
On the opportunities and risks of foundation models
R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ...
arXiv preprint arXiv:2108.07258, 2021
49612021
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, ...
https://transformer-circuits.pub/2023/monosemantic-features/index.html, 2023
3362023
Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet
A Templeton, T Conerly, J Marcus, J Lindsey, T Bricken, B Chen, ...
Transformer Circuits Thread, 2024
2412024
Towards measuring the representation of subjective global opinions in language models
E Durmus, K Nguyen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ...
arXiv preprint arXiv:2306.16388, 2023
1862023
Studying large language model generalization with influence functions
R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ...
arXiv preprint arXiv:2308.03296, 2023
1392023
Many-shot jailbreaking
C Anil, E Durmus, N Panickssery, M Sharma, J Benton, S Kundu, J Batson, ...
Advances in Neural Information Processing Systems 37, 129696-129742, 2024
1072024
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
R Keramati, C Dann, A Tamkin, E Brunskill
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020
1002020
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
A Tamkin, M Wu, N Goodman
ICLR 2021, 2020
812020
Drone.io: A Gestural and Visual Interface for Human-Drone Interaction
JR Cauchard, A Tamkin, CY Wang, L Vink, M Park, T Fang, JA Landay
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
652019
Evaluating and mitigating discrimination in language model decisions
A Tamkin, A Askell, L Lovitt, E Durmus, N Joseph, S Kravec, K Nguyen, ...
arXiv preprint arXiv:2312.03689, 2023
612023
Investigating transferability in pretrained language models
A Tamkin, T Singh, D Giovanardi, N Goodman
Findings of EMNLP 2020, 2020
542020
Eliciting human preferences with language models
BZ Li, A Tamkin, N Goodman, J Andreas
arXiv preprint arXiv:2310.11589, 2023
532023
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
A Tamkin, M Brundage, J Clark, D Ganguli
arXiv preprint arXiv:2102.02503, https://arxiv.org/abs/2102.02503, 2021
50*2021
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
A Tamkin, D Jurafsky, N Goodman
NeurIPS 2020, 2020
452020
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
A Tamkin, V Liu, R Lu, D Fein, C Schultz, N Goodman
NeurIPS 2021, 2021
412021
Distributionally-Aware Exploration for CVaR Bandits
A Tamkin, R Keramati, C Dann, E Brunskill
NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, 2019
412019
Active Learning Helps Pretrained Models Learn the Intended Task
A Tamkin, D Nguyen, S Deshpande, J Mu, N Goodman
NeurIPS 2022, 2022
402022
C5t5: Controllable generation of organic molecules with transformers
D Rothchild, A Tamkin, J Yu, U Misra, J Gonzalez
arXiv preprint arXiv:2108.10307, 2021
372021
Recursive Routing Networks: Learning to Compose Modules for Language Understanding
I Cases, C Rosenbaum, M Riemer, A Geiger, T Klinger, A Tamkin, O Li, ...
NAACL 2019, 2019
342019
Collective constitutional ai: Aligning a language model with public input
S Huang, D Siddarth, L Lovitt, TI Liao, E Durmus, A Tamkin, D Ganguli
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024
292024
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–20