متابعة
Hannah Rose Kirk
Hannah Rose Kirk
بريد إلكتروني تم التحقق منه على oii.ox.ac.uk - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models
HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ...
Advances in neural information processing systems 34, 2611-2624, 2021
2272021
Auditing large language models: a three-layered approach
J Mökander, J Schuett, HR Kirk, L Floridi
AI and Ethics 4 (4), 1085-1115, 2024
2142024
The benefits, risks and bounds of personalizing the alignment of large language models to individuals
HR Kirk, B Vidgen, P Röttger, SA Hale
Nature Machine Intelligence 6 (4), 383-392, 2024
192*2024
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy
Proceedings of the 2024 Conference of the North American Chapter of the …, 2023
1532023
Dataperf: Benchmarks for data-centric ai development
M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ...
Advances in Neural Information Processing Systems 36, 5320-5347, 2023
1392023
Semeval-2023 task 10: Explainable detection of online sexism
HR Kirk, W Yin, B Vidgen, P Röttger
Best Paper Award, Proceedings of the 17th International Workshop on Semantic …, 2023
1392023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning
H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022
1082022
The PRISM alignment dataset: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models
HR Kirk, A Whitefield, P Rottger, AM Bean, K Margatina, ...
Best Paper Award, Advances in Neural Information Processing Systems 37 …, 2025
88*2025
Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models
P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy
Outstanding Paper Award, Proceedings of the 62nd Annual Meeting of the …, 2024
662024
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale
Proceedings of the 2022 Conference of the North American Chapter of the …, 2021
642021
Handling and Presenting Harmful Text in NLP
HR Kirk, A Birhane, B Vidgen, L Derczynski
EMNLP Findings, 2022
51*2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk
Proceedings of the 4th workshop on gender bias in natural language …, 2022
502022
Introducing v0. 5 of the ai safety benchmark from mlcommons
B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ...
arXiv preprint arXiv:2404.12241, 2024
402024
The past, present and better future of feedback learning in large language models for subjective human preferences and values
HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
382023
Assessing language model deployment with risk cards
L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ...
arXiv preprint arXiv:2303.18190, 2023
362023
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ...
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021
352021
Simplesafetytests: a test suite for identifying critical safety risks in large language models
B Vidgen, N Scherrer, HR Kirk, R Qian, A Kannappan, SA Hale, P Röttger
arXiv preprint arXiv:2311.08370, 2023
322023
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution
SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk
Advances in Neural Information Processing Systems 36, 63687-63723, 2023
312023
Adversarial nibbler: An open red-teaming method for identifying diverse harms in text-to-image generation
J Quaye, A Parrish, O Inel, C Rastogi, HR Kirk, M Kahng, E Van Liemt, ...
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024
24*2024
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets
B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain
arXiv preprint arXiv:2305.15407, 2023
222023
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–20