A broad-coverage challenge corpus for sentence understanding through inference A Williams, N Nangia, SR Bowman arXiv preprint arXiv:1704.05426, 2017 | 4709 | 2017 |
Superglue: A stickier benchmark for general-purpose language understanding systems A Wang, Y Pruksachatkun, N Nangia, A Singh, J Michael, F Hill, O Levy, ... Advances in neural information processing systems 32, 2019 | 2317 | 2019 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 1143 | 2022 |
CrowS-pairs: A challenge dataset for measuring social biases in masked language models N Nangia, C Vania, R Bhalerao, SR Bowman arXiv preprint arXiv:2010.00133, 2020 | 606 | 2020 |
BBQ: A hand-built bias benchmark for question answering A Parrish, A Chen, N Nangia, V Padmakumar, J Phang, J Thompson, ... arXiv preprint arXiv:2110.08193, 2021 | 280 | 2021 |
Listops: A diagnostic dataset for latent tree learning N Nangia, SR Bowman arXiv preprint arXiv:1804.06028, 2018 | 139 | 2018 |
The repeval 2017 shared task: Multi-genre natural language inference with sentence representations N Nangia, A Williams, A Lazaridou, SR Bowman arXiv preprint arXiv:1707.08172, 2017 | 112 | 2017 |
Human vs. muppet: A conservative estimate of human performance on the GLUE benchmark N Nangia, SR Bowman arXiv preprint arXiv:1905.10425, 2019 | 109 | 2019 |
QuALITY: Question answering with long input texts, yes! RY Pang, A Parrish, N Joshi, N Nangia, J Phang, A Chen, V Padmakumar, ... arXiv preprint arXiv:2112.08608, 2021 | 106 | 2021 |
jiant 1.2: A software toolkit for research on general-purpose text understanding models A Wang, IF Tenney, Y Pruksachatkun, K Yu, J Hula, P Xia, R Pappagari, ... Note: http://jiant. info/Cited by: footnote 4, 2019 | 54 | 2019 |
Does putting a linguist in the loop improve NLU data collection? A Parrish, W Huang, O Agha, SH Lee, N Nangia, A Warstadt, K Aggarwal, ... arXiv preprint arXiv:2104.07179, 2021 | 40 | 2021 |
What do nlp researchers believe? results of the nlp community metasurvey J Michael, A Holtzman, A Parrish, A Mueller, A Wang, A Chen, D Madaan, ... arXiv preprint arXiv:2208.12852, 2022 | 31 | 2022 |
What ingredients make for an effective crowdsourcing protocol for difficult NLU data collection tasks? N Nangia, S Sugawara, H Trivedi, A Warstadt, C Vania, SR Bowman arXiv preprint arXiv:2106.00794, 2021 | 29 | 2021 |
A broad-coverage challenge corpus for sentence understanding through inference. arXiv 2017 A Williams, N Nangia, SR Bowman arXiv preprint arXiv:1704.05426, 0 | 18 | |
The multi-genre nli corpus A Williams, N Nangia, SR Bowman | 17 | 2018 |
Single-turn debate does not help humans answer hard reading-comprehension questions A Parrish, H Trivedi, E Perez, A Chen, N Nangia, J Phang, SR Bowman arXiv preprint arXiv:2204.05212, 2022 | 14 | 2022 |
What Makes Reading Comprehension Questions Difficult? S Sugawara, N Nangia, A Warstadt, SR Bowman arXiv preprint arXiv:2203.06342, 2022 | 13 | 2022 |
Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions A Parrish, H Trivedi, N Nangia, V Padmakumar, J Phang, AS Saimbhi, ... arXiv preprint arXiv:2210.10860, 2022 | 10 | 2022 |
Crowdsourcing beyond annotation: Case studies in benchmark data collection A Suhr, C Vania, N Nangia, M Sap, M Yatskar, S Bowman, Y Artzi Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 9 | 2021 |
Latent structure models for natural language processing AFT Martins, T Mihaylova, N Nangia, V Niculae Proceedings of the 57th Annual Meeting of the Association for Computational …, 2019 | 7 | 2019 |