OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.05.2026, 18:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT-4 Turbo and Meta’s LLaMA 3.1: A Relative Analysis of Answering Radiology Text-Based Questions

2024·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

4

Autoren

2024

Jahr

Abstract

AIMS AND OBJECTIVES: This study aimed to compare the accuracy of two AI models - OpenAI's GPT-4 Turbo (San Francisco, CA) and Meta's LLaMA 3.1 (Menlo Park, CA) - when answering a standardized set of pediatric radiology questions. The primary objective was to evaluate the overall accuracy of each model, while the secondary objective was to assess their performance within subsections. METHODS AND MATERIALS: A total of 79 text-based pediatric radiology questions were selected out of 302 total questions for this comparison. The questions covered seven subsections, including musculoskeletal, chest, and neuroradiology, among others. Image-based questions were excluded to focus on text interpretation and to minimize the sampling bias within each model. Each model was tested independently on the same question set, and the percent accuracy was calculated for both overall performance as well as individual subsections. RESULTS: GPT-4 Turbo performed at an overall accuracy of 88.6% (70/79 questions), outperforming LLaMA 3.1's 77.2% (61/79). Within subsections, GPT-4 Turbo had higher accuracy in most areas, except for equal accuracy in the neuroradiology section. The subsections with the greatest accuracy for GPT-4 Turbo, in descending order, were chest and cardiac radiology (100%), musculoskeletal system (93.3%), and genitourinary system (92.9%). LLaMA 3.1's highest performance was 86.7% in the musculoskeletal system, while its lowest was 50.0% in chest radiology. CONCLUSION: GPT-4 Turbo consistently outperformed LLaMA 3.1 in answering pediatric radiology questions, both overall and within most subsections. These findings suggest that GPT-4 Turbo may offer more accurate responses in specialized medical education, in contrast to LLaMA 3.1's efficient performance, although future research should further evaluate AI models' performance within other fields.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationClinical Reasoning and Diagnostic Skills
Volltext beim Verlag öffnen