Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models P Manakul, A Liusie, MJF Gales EMNLP 2023, 2023 | 644 | 2023 |
LLM comparative assessment: Zero-shot NLG evaluation through pairwise comparisons using large language models A Liusie, P Manakul, MJF Gales arXiv preprint arXiv:2307.07889, 2023 | 54 | 2023 |
Is llm-as-a-judge robust? investigating universal adversarial attacks on zero-shot llm assessment V Raina, A Liusie, M Gales arXiv preprint arXiv:2402.14016, 2024 | 42 | 2024 |
MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization P Manakul, A Liusie, MJF Gales IJCNLP-AACL 2023, 2023 | 35 | 2023 |
Rewarding Chatbots for Real-World Engagement with Millions of Users R Irvine, D Boubert, V Raina, A Liusie, V Mudupalli, A Korshuk, Z Liu, ... arXiv preprint arXiv:2303.06135, 2023 | 22 | 2023 |
Zero-shot NLG evaluation through Pairware Comparisons with LLMs A Liusie, P Manakul, MJF Gales EACL 2024, 2023 | 17 | 2023 |
Analyzing Biases to Spurious Correlations in Text Classification Tasks A Liusie, V Raina, V Raina, M Gales IJCNLP-AACL 2022, 2022 | 15 | 2022 |
Blending is all you need: Cheaper, better alternative to trillion-parameters llm X Lu, Z Liu, A Liusie, V Raina, V Mudupalli, Y Zhang, W Beauchamp arXiv preprint arXiv:2401.02994, 2024 | 13 | 2024 |
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models P Manakul, Y Fathullah, A Liusie, V Raina, V Raina, M Gales BioNLP Workshop @ ACL 2023, 2023 | 13 | 2023 |
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models P Molenda, A Liusie, MJF Gales NAACL 2024 (findings), 2024 | 11 | 2024 |
Mitigating Word Bias in Zero-shot Prompt-based Classifiers A Liusie, P Manakul, MJF Gales IJCNLP-AACL 2023, 2023 | 9 | 2023 |
The cambridge multiple-choice questions reading dataset A Mullooly, Ø Andersen, L Benedetto, P Buttery, A Caines, MJF Gales, ... Cambridge University Press and Assessment, 2023 | 9 | 2023 |
Efficient llm comparative assessment: a product of experts framework for pairwise comparisons A Liusie, V Raina, Y Fathullah, M Gales arXiv preprint arXiv:2405.05894, 2024 | 8 | 2024 |
Teacher-student training for debiasing: General permutation debiasing for large language models A Liusie, Y Fathullah, MJF Gales arXiv preprint arXiv:2403.13590, 2024 | 6 | 2024 |
Investigating the Emergent Audio Classification Ability of ASR Foundation Models R Ma, A Liusie, MJF Gales, KM Knill NAACL 2024, 2023 | 6 | 2023 |
Analysis of the cambridge multiple-choice questions reading dataset with a focus on candidate response distribution A Liusie, V Raina, A Mullooly, K Knill, MJF Gales arXiv preprint arXiv:2306.13047, 2023 | 5 | 2023 |
Automatic assessment of conversational speaking tests SW McKnight, A Civelekoglu, M Gales, S Bannó, A Liusie, KM Knill Proc. of the Workshop on Speech and Language Technology in Education (SLaTE …, 2023 | 4 | 2023 |
" World Knowledge" in Multiple Choice Reading Comprehension A Liusie, V Raina, M Gales Proceedings of the Sixth Fact Extraction and VERification Workshop (FEVER …, 2022 | 4 | 2022 |
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models G Sun, P Manakul, A Liusie, K Pipatanakul, C Zhang, P Woodland, ... arXiv preprint arXiv:2405.13684, 2024 | 2 | 2024 |
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes Y Fathullah, P Radmard, A Liusie, MJF Gales arXiv preprint arXiv:2305.05098, 2023 | 2 | 2023 |