Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI

2025·2 Zitationen·Frontiers in MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background: This study systematically evaluates and compares the performance of ChatGPT 3. 5, Google Bard (Gemini), Perplexity AI, Microsoft Copilot, and Meta AI in responding to infectious disease-related multiple-choice questions (MCQs). Methods: A systematic comparative study was conducted using 20 infectious disease case studies sourced from Infectious Diseases: A Case Study Approach by Jonathan C. Cho. Each case study included 7-10 MCQs, resulting in a total of 160 questions. AI platforms were provided with standardized prompts containing the case study text and MCQs without additional context. Their responses were evaluated against a reference answer key from the textbook. Accuracy was measured by the percentage of correct responses, and consistency was assessed by submitting identical prompts 24 h apart. Results: ChatGPT 3.5 achieved the highest numerical accuracy (65.6%), followed by Perplexity AI (63.2%), Microsoft Copilot (60.9%), Meta AI (60.8%), and Google Bard (58.8%). AI models performed best in symptom identification (76.5%) and worst in therapy-related questions (57.1%). ChatGPT 3.5 demonstrated strong diagnostic accuracy (79.1%) but had a significant drop in antimicrobial treatment recommendations (56.6%). Google Bard performed inconsistently in microorganism identification (61.9%) and preventive therapy (62.5%). Microsoft Copilot exhibited the most stable responses across repeated testing, while ChatGPT 3.5 showed a 7.5% accuracy decline. Perplexity AI and Meta AI struggled with individualized treatment recommendations, showing variability in drug selection and dosing adjustments. AI-generated responses were found to change over time, with some models giving different antimicrobial recommendations for the same case scenario upon repeated testing. Conclusion: AI platforms offer potential in infectious disease education but demonstrate limitations in pharmacotherapy decision-making, particularly in antimicrobial selection and dosing accuracy. ChatGPT 3.5 performed best but lacked response stability, while Microsoft Copilot showed greater consistency but lacked nuanced therapeutic reasoning. Further research is needed to improve AI-driven decision support systems for medical education and clinical applications through clinical trials, evaluation of real-world patient data, and assessment of long-term stability.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsAntibiotic Use and Resistance

Volltext beim Verlag öffnen

Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen