Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Contrasting the performance of mainstream Large Language Models in Radiology Board Examinations (Preprint)

2024·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> Artificial Intelligence advancements have enabled Large Language Models to significantly impact radiology education and diagnostic accuracy. </sec> <sec> <title>OBJECTIVE</title> This study evaluates the performance of mainstream Large Language Models, including GPT-4, Claude, Bard, Tongyi Qianwen, and Gemini Pro, in radiology board exams. </sec> <sec> <title>METHODS</title> A comparative analysis of 150 multiple-choice questions from radiology board exams without images was conducted. Models were assessed on accuracy in text-based questions categorized by cognitive levels and medical specialties using chi-square tests and ANOVA. </sec> <sec> <title>RESULTS</title> GPT-4 achieved the highest accuracy (83.3%), significantly outperforming others. Tongyi Qianwen also performed well (70.7%). Performance varied across question types and specialties, with GPT-4 excelling in both lower-order and higher-order questions, while Claude and Bard struggled with complex diagnostic questions. </sec> <sec> <title>CONCLUSIONS</title> GPT-4 and Tongyi Qianwen show promise in medical education and training. The study emphasizes the need for domain-specific training datasets to enhance large models' effectiveness in specialized fields like radiology. </sec>

Autoren

Boxiong Wei

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Contrasting the performance of mainstream Large Language Models in Radiology Board Examinations (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen