Training language models to follow instructions with human feedback L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ... Advances in neural information processing systems 35, 27730-27744, 2022 | 10593 | 2022 |
GPT-4 Technical Report OpenAI https://arxiv.org/abs/2303.08774, 2023 | 6980* | 2023 |
A holistic approach to undesired content detection in the real world T Markov, C Zhang, S Agarwal, FE Nekoul, T Lee, S Adler, A Jiang, ... Proceedings of the AAAI Conference on Artificial Intelligence 37 (12), 15009 …, 2023 | 170 | 2023 |
Training language models to follow instructions with human feedback, March 2022 L Ouyang, J Wu, X Jiang, D Almeida, CL Wainwright, P Mishkin, C Zhang, ... URL http://arxiv. org/abs/2203.02155 92, 0 | 51 | |
An Efficient Adversarial Attack for Tree Ensembles C Zhang, H Zhang, CJ Hsieh Advances in Neural Information Processing Systems (NeurIPS) 2020, 2020 | 33 | 2020 |
GPT-4V(ision) System Card OpenAI https://cdn.openai.com/papers/GPTV_System_Card.pdf, 2023 | 26 | 2023 |
New and improved content moderation tooling T Markov, C Zhang, S Agarwal, T Eloundou, T Lee, S Adler, A Jiang, ... OpenAI.< https://openai. com/blog/new-andimproved-content-moderation-tooling …, 2022 | 17 | 2022 |
Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation C Zhang, J Zhao, H Zhang, KW Chang, CJ Hsieh NAACL 2021, 2021 | 12 | 2021 |
Gpt-4o system card A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ... arXiv preprint arXiv:2410.21276, 2024 | 10 | 2024 |
Systems and methods for language model-based content classification T Markov, C Zhang, S Agarwal, FME NEKOUL, T Lee, S Adler, A Jiang, ... US Patent App. 18/308,586, 2024 | | 2024 |
On the Robustness of Robustness and Counterfactual Bias Evaluation C Zhang University of California, Los Angeles, 2021 | | 2021 |