Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 1392 | 2022 |
Holistic evaluation of language models P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, M Yasunaga, Y Zhang, ... arXiv preprint arXiv:2211.09110, 2022 | 1244 | 2022 |
Premise Order Matters in Reasoning with Large Language Models X Chen, RA Chi, X Wang, D Zhou https://arxiv.org/abs/2402.08939, 2024 | 57 | 2024 |
Holistic evaluation of language models, 2022 P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, M Yasunaga, Y Zhang, ... URL https://arxiv. org/abs/2211.09110 3, 2023 | 40 | 2023 |
ModeLing: A novel dataset for testing linguistic reasoning in language models NA Chi, T Malchev, R Kong, RA Chi, L Huang, EA Chi, RT McCoy, ... arXiv preprint arXiv:2406.17038, 2024 | 8 | 2024 |
Lingoly: A benchmark of olympiad-level linguistic reasoning puzzles in low-resource and extinct languages AM Bean, S Hellsten, H Mayne, J Magomere, EA Chi, R Chi, SA Hale, ... arXiv preprint arXiv:2406.06196, 2024 | 7 | 2024 |
Dialogue distillery: Crafting interpolable, interpretable, and introspectable dialogue from llms RA Chi, J Kim, S Hickmann, S Li, G Chi, T Atchariyachanvanit, K Yu, ... Alexa Prize SocialBot Grand Challenge 5, 2023 | 6 | 2023 |
Stanford MLab at SemEval 2023 task 7: Neural methods for clinical trial report NLI C Takehana, D Lim, E Kurtuluş, R Iyer, E Tanimura, P Aggarwal, ... Proceedings of the 17th International Workshop on Semantic Evaluation …, 2023 | 2 | 2023 |
Redwoodnlp at semeval-2021 task 7: Ensembled pretrained and lightweight models for humor detection N Chi, R Chi Proceedings of the 15th international workshop on semantic evaluation …, 2021 | 2 | 2021 |
GLARE: Generative Left-to-right AdversaRial Examples RA Chi, N Kim, P Liu, Z Lack, EA Chi Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems …, 2022 | | 2022 |
Stanford MLab at SemEval 2022 Task 7: Tree-and Transformer-Based Methods for Clarification Plausibility T Yim, J Lee, R Verma, S Hickmann, A Zhu, C Sallade, I Ng, R Chi, P Liu Proceedings of the 16th International Workshop on Semantic Evaluation …, 2022 | | 2022 |
Automated Topic-Tagging for Software-Related Question-and-Answer Sites A Agrawal, RA Chi, V Gupta | | |