Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The model student: GPT-4 performance on graduate biomedical science exams
51
Zitationen
6
Autoren
2024
Jahr
Abstract
The GPT-4 large language model (LLM) and ChatGPT chatbot have emerged as accessible and capable tools for generating English-language text in a variety of formats. GPT-4 has previously performed well when applied to questions from multiple standardized examinations. However, further evaluation of trustworthiness and accuracy of GPT-4 responses across various knowledge domains is essential before its use as a reference resource. Here, we assess GPT-4 performance on nine graduate-level examinations in the biomedical sciences (seven blinded), finding that GPT-4 scores exceed the student average in seven of nine cases and exceed all student scores for four exams. GPT-4 performed very well on fill-in-the-blank, short-answer, and essay questions, and correctly answered several questions on figures sourced from published manuscripts. Conversely, GPT-4 performed poorly on questions with figures containing simulated data and those requiring a hand-drawn answer. Two GPT-4 answer-sets were flagged as plagiarism based on answer similarity and some model responses included detailed hallucinations. In addition to assessing GPT-4 performance, we discuss patterns and limitations in GPT-4 capabilities with the goal of informing design of future academic examinations in the chatbot era.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.