Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Which curriculum components do medical students find most helpful for evaluating AI outputs?
8
Zitationen
5
Autoren
2025
Jahr
Abstract
INTRODUCTION: The risk and opportunity of Large Language Models (LLMs) in medical education both rest in their imitation of human communication. Future doctors working with generative artificial intelligence (AI) need to judge the value of any outputs from LLMs to safely direct the management of patients. We set out to investigate medical students' ability to evaluate LLM responses to clinical vignettes, identify which prior learning they utilised to scrutinise the LLM answers, and assess their awareness of 'clinical prompt engineering'. METHODS: Final year medical students were asked in a survey to assess the accuracy of the answers provided by generative pre-trained transformer (GPT) 3.5 in response to ten clinical scenarios, five of which GPT 3.5 had answered incorrectly, and to identify which prior training enabled them to evaluate the GPT 3.5 output. A content analysis was conducted amongst 148 consenting medical students. RESULTS: The median percentage of students who correctly evaluated the LLM output was 56%. Students reported interactive case-based and pathology teaching using questions to be the most helpful training provided by the medical school for evaluating AI outputs. Only 5% were familiar with the concept of 'clinical prompt engineering'. CONCLUSION: Pathology and interactive case-based teaching using questions were the self-reported best training for medical students to safely interact with the outputs of LLMs. This study can inform the design of medical training for future doctors graduating into AI-enhanced health services.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.