Dirk Groeneveld

Cited by

	All	Since 2019
Citations	1755	1604
h-index	15	14
i10-index	16	15

660

330

165

495

20132014201520162017201820192020202120222023202411 18 13 18 26 44 81 165 161 219 317 650

Co-authors

Iz BeltagyAllen Institute for Artificial IntelligenceVerified email at beltagy.net
Oren EtzioniProfessor, University of WashingtonVerified email at cs.uw.edu
Jason DunkelbergerSemantic ScholarVerified email at allenai.org
Miles CrawfordAllen Institute for Artificial IntelligenceVerified email at u.washington.edu
Sergey FeldmanAllen Institute of Artificial Intelligence, Alongside CareVerified email at data-cowboys.com
Chandra BhagavatulaAllen Institute for Artificial IntelligenceVerified email at allenai.org
Madeleine van ZuylenNortheastern University | Allen Institute for AIVerified email at northeastern.edu
Matthew E PetersSpiffy AI, Allen Institute for Artificial IntelligenceVerified email at allenai.org
Vu HaApplied Research & EngineeringVerified email at ai2incubator.com

Dirk Groeneveld

Allen Institute for Artificial Intelligence

Verified email at allenai.org

natural language processing neural networks deep learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Construction of the literature graph in semantic scholar W Ammar, D Groeneveld, C Bhagavatula, I Beltagy, M Crawford, ... arXiv preprint arXiv:1805.02262, 2018	502	2018
Documenting large webtext corpora: A case study on the colossal clean crawled corpus J Dodge, M Sap, A Marasović, W Agnew, G Ilharco, D Groeneveld, ... arXiv preprint arXiv:2104.08758, 2021	429	2021
From ‘F’to ‘A’on the NY regents science exams: An overview of the aristo project P Clark, O Etzioni, T Khot, D Khashabi, B Mishra, K Richardson, ... Ai Magazine 41 (4), 39-53, 2020	121	2020
Generating search result summaries D Groeneveld, D Meyerzon, D Mowatt US Patent 8,285,699, 2012	120	2012
Olmo: Accelerating the science of language models D Groeneveld, I Beltagy, P Walsh, A Bhagia, R Kinney, O Tafjord, AH Jha, ... arXiv preprint arXiv:2402.00838, 2024	85	2024
Dolma: An open corpus of three trillion tokens for language model pretraining research L Soldaini, R Kinney, A Bhagia, D Schwenk, D Atkinson, R Authur, ... arXiv preprint arXiv:2402.00159, 2024	81	2024
What's In My Big Data? Y Elazar, A Bhagia, I Magnusson, A Ravichander, D Schwenk, A Suhr, ... arXiv preprint arXiv:2310.20707, 2023	64	2023
Name search using a ranking function DH Groeneveld, D Meyerzon, D Mowatt, JA Alspaugh US Patent 8,645,417, 2014	61	2014
Generating search result summaries D Groeneveld, D Meyerzon, D Mowatt US Patent 7,853,587, 2010	51	2010
A simple yet strong pipeline for hotpotqa D Groeneveld, T Khot, A Sabharwal arXiv preprint arXiv:2004.06753, 2020	46	2020
Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, et al. 2024. Olmo: Accelerating the science of language models D Groeneveld, I Beltagy, P Walsh, A Bhagia, R Kinney, O Tafjord arXiv preprint arXiv:2402.00838, 2024	41	2024
Ananya Harsh Jha D Groeneveld, I Beltagy, P Walsh, A Bhagia, R Kinney, O Tafjord Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson …, 2024	34	2024
IKE-an interactive tool for knowledge extraction B Dalvi, S Bhakthavatsalam, C Clark, P Clark, O Etzioni, A Fader, ... Proceedings of the 5th workshop on automated knowledge base construction, 12-17, 2016	33	2016
DataComp-LM: In search of the next generation of training sets for language models J Li, A Fang, G Smyrnis, M Ivgi, M Jordan, S Gadre, H Bansal, E Guha, ... arXiv preprint arXiv:2406.11794, 2024	18	2024
Generating search result summaries D Groeneveld, D Meyerzon, D Mowatt US Patent 8,032,519, 2011	17	2011
Name search using a ranking function DH Groeneveld, D Meyerzon, D Mowatt, JA Alspaugh US Patent 9,727,639, 2017	13	2017
Dolma: An open corpus of 3 trillion tokens for language model pretraining research L Soldaini, R Kinney, A Bhagia, D Schwenk, D Atkinson, R Authur, ... Allen Institute for AI, Tech. Rep, 5998-6008, 2023	9	2023
Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models M Deitke, C Clark, S Lee, R Tripathi, Y Yang, JS Park, M Salehi, ... arXiv preprint arXiv:2409.17146, 2024	7	2024
Large language model distillation doesn’t need a teacher AH Jha, D Groeneveld, E Strubell, I Beltagy arXiv preprint arXiv:2305.14864, 2023	4	2023
Construction of the literature graph in semantic scholar. NAACL W Ammar, D Groeneveld, C Bhagavatula, I Beltagy, M Crawford, ... URL: https://www. semanticscholar. org/paper …, 2018	4	2018

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors