Measuring massive multitask language understanding D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt arXiv preprint arXiv:2009.03300, 2020 | 2503 | 2020 |
The many faces of robustness: A critical analysis of out-of-distribution generalization D Hendrycks, S Basart, N Mu, S Kadavath, F Wang, E Dorundo, R Desai, ... Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 1650 | 2021 |
Natural adversarial examples D Hendrycks, K Zhao, S Basart, J Steinhardt, D Song Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 1511 | 2021 |
Measuring mathematical problem solving with the math dataset D Hendrycks, C Burns, S Kadavath, A Arora, S Basart, E Tang, D Song, ... arXiv preprint arXiv:2103.03874, 2021 | 1025 | 2021 |
Improving and Assessing Anomaly Detectors for Large-Scale Settings D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ... | 497* | 2022 |
Measuring coding challenge competence with apps D Hendrycks, S Basart, S Kadavath, M Mazeika, A Arora, E Guo, C Burns, ... arXiv preprint arXiv:2105.09938, 2021 | 482 | 2021 |
Aligning ai with shared human values D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt arXiv preprint arXiv:2008.02275, 2020 | 430 | 2020 |
Representation engineering: A top-down approach to ai transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023 | 223 | 2023 |
Diode: A dense indoor and outdoor depth dataset I Vasiljevic, N Kolkin, S Zhang, R Luo, H Wang, FZ Dai, AF Daniele, ... arXiv preprint arXiv:1908.00463, 2019 | 209 | 2019 |
Testing robustness against unforeseen adversaries D Kang, Y Sun, D Hendrycks, T Brown, J Steinhardt | 148 | 2019 |
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark A Pan, JS Chan, A Zou, N Li, S Basart, T Woodside, H Zhang, S Emmons, ... International Conference on Machine Learning, 26837-26867, 2023 | 117 | 2023 |
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ... arXiv preprint arXiv:2402.04249, 2024 | 114 | 2024 |
The wmdp benchmark: Measuring and reducing malicious use with unlearning N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... arXiv preprint arXiv:2403.03218, 2024 | 66 | 2024 |
Measuring massive multitask language understanding, 2020 D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt arXiv preprint arXiv:2009.03300, 2009 | 52 | 2009 |
How would the viewer feel? Estimating wellbeing from video scenarios M Mazeika, E Tang, A Zou, S Basart, JS Chan, D Song, D Forsyth, ... Advances in Neural Information Processing Systems 35, 18571-18585, 2022 | 14 | 2022 |
Scaling out-of-distribution detection for real-world settings S Basart, M Mantas, M Mohammadreza, S Jacob, S Dawn International Conference on Machine Learning, 2022 | 10 | 2022 |
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? R Ren, S Basart, A Khoja, A Gatti, L Phan, X Yin, M Mazeika, A Pan, ... arXiv preprint arXiv:2407.21792, 2024 | 5 | 2024 |
A quantitative measure of generative adversarial network distributions D Hendrycks, S Basart | 4 | 2017 |
Evaluating Robustness to Unforeseen Adversarial Attacks M Kaufmann, D Kang, Y Sun, X Yin, S Basart, M Mazeika, A Dziedzic, ... | | 2023 |
Towards Robustness of Neural Networks S Basart The University of Chicago, 2021 | | 2021 |