Spremljaj
Can Rager
Can Rager
Research Assistant
Preverjeni e-poštni naslov na northeastern.edu - Domača stran
Naslov
Navedeno
Navedeno
Leto
Sparse feature circuits: Discovering and editing interpretable causal graphs in language models
S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller
arXiv preprint arXiv:2403.19647, 2024
872024
Attribution patching outperforms automated circuit discovery
A Syed, C Rager, A Conmy
BlackBoxNLP 2024, 2024
482024
The quest for the right mediator: A history, survey, and theoretical grounding of causal interpretability
A Mueller, J Brinkmann, M Li, S Marks, K Pal, N Prakash, C Rager, ...
arXiv preprint arXiv:2408.01416, 2024
132024
Measuring progress in dictionary learning for language model interpretability with board game models
A Karvonen, B Wright, C Rager, R Angell, J Brinkmann, L Smith, ...
NeurIPS 2024, 2024
132024
Nnsight and ndif: Democratizing access to foundation model internals
JF Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, K Pal, D Troitskii, ...
The Thirteenth International Conference on Learning Representations, 2024
92024
Linearly structured world representations in maze-solving transformers
M Ivanitskiy, AF Spies, T Räuker, G Corlouer, C Mathwin, L Quirke, ...
Proceedings of UniReps: the First Workshop on Unifying Representations in …, 2024
8*2024
The advantage of foraging myopically
CL Rager, U Bhat, O Bénichou, S Redner
Journal of Statistical Mechanics: Theory and Experiment 2018 (7), 073501, 2018
72018
A Configurable Library for Generating and Manipulating Maze Datasets
M Igorevich Ivanitskiy, R Shah, AF Spies, T Räuker, D Valentine, C Rager, ...
arXiv e-prints, arXiv: 2309.10498, 2023
4*2023
An adversarial example for direct logit attribution: Memory management in gelu-4l
J Dao, YT Lau, C Rager, J Janiak
arXiv preprint arXiv:2310.07325, 2023
22023
Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks
A Karvonen, C Rager, S Marks, N Nanda
arXiv preprint arXiv:2411.18895, 2024
12024
NNsight and NDIF: Democratizing access to open-weight foundation model internals
J Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, K Pal, D Troitskii, ...
arXiv preprint arXiv:2407.14561, 2024
12024
Safety of self-assembled neuromorphic hardware
C Rager, K Webster
arXiv preprint arXiv:2301.10201, 2023
2023
Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.
Članki 1–12