Follow
Steven Basart
Steven Basart
PhD, University of Chicago
Verified email at ttic.edu - Homepage
Title
Cited by
Cited by
Year
Measuring massive multitask language understanding
D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt
arXiv preprint arXiv:2009.03300, 2020
25032020
The many faces of robustness: A critical analysis of out-of-distribution generalization
D Hendrycks, S Basart, N Mu, S Kadavath, F Wang, E Dorundo, R Desai, ...
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
16502021
Natural adversarial examples
D Hendrycks, K Zhao, S Basart, J Steinhardt, D Song
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
15112021
Measuring mathematical problem solving with the math dataset
D Hendrycks, C Burns, S Kadavath, A Arora, S Basart, E Tang, D Song, ...
arXiv preprint arXiv:2103.03874, 2021
10252021
Improving and Assessing Anomaly Detectors for Large-Scale Settings
D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ...
497*2022
Measuring coding challenge competence with apps
D Hendrycks, S Basart, S Kadavath, M Mazeika, A Arora, E Guo, C Burns, ...
arXiv preprint arXiv:2105.09938, 2021
4822021
Aligning ai with shared human values
D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt
arXiv preprint arXiv:2008.02275, 2020
4302020
Representation engineering: A top-down approach to ai transparency
A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ...
arXiv preprint arXiv:2310.01405, 2023
2232023
Diode: A dense indoor and outdoor depth dataset
I Vasiljevic, N Kolkin, S Zhang, R Luo, H Wang, FZ Dai, AF Daniele, ...
arXiv preprint arXiv:1908.00463, 2019
2092019
Testing robustness against unforeseen adversaries
D Kang, Y Sun, D Hendrycks, T Brown, J Steinhardt
1482019
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark
A Pan, JS Chan, A Zou, N Li, S Basart, T Woodside, H Zhang, S Emmons, ...
International Conference on Machine Learning, 26837-26867, 2023
1172023
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal
M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ...
arXiv preprint arXiv:2402.04249, 2024
1142024
The wmdp benchmark: Measuring and reducing malicious use with unlearning
N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ...
arXiv preprint arXiv:2403.03218, 2024
662024
Measuring massive multitask language understanding, 2020
D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt
arXiv preprint arXiv:2009.03300, 2009
522009
How would the viewer feel? Estimating wellbeing from video scenarios
M Mazeika, E Tang, A Zou, S Basart, JS Chan, D Song, D Forsyth, ...
Advances in Neural Information Processing Systems 35, 18571-18585, 2022
142022
Scaling out-of-distribution detection for real-world settings
S Basart, M Mantas, M Mohammadreza, S Jacob, S Dawn
International Conference on Machine Learning, 2022
102022
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
R Ren, S Basart, A Khoja, A Gatti, L Phan, X Yin, M Mazeika, A Pan, ...
arXiv preprint arXiv:2407.21792, 2024
52024
A quantitative measure of generative adversarial network distributions
D Hendrycks, S Basart
42017
Evaluating Robustness to Unforeseen Adversarial Attacks
M Kaufmann, D Kang, Y Sun, X Yin, S Basart, M Mazeika, A Dziedzic, ...
2023
Towards Robustness of Neural Networks
S Basart
The University of Chicago, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–20