Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative study of the performance of ChatGPT-4, Claude, Gemini, Mistral, and perplexity on multiple-choice questions in cardiology
0
Zitationen
8
Autoren
2025
Jahr
Abstract
BACKGROUND: Artificial intelligence, particularly Large Language Models (LLMs), has revolutionized the field of medicine. Their ability to understand and answer medical questions is generating growing interest, especially in cardiology, where diagnostic and therapeutic accuracy is essential. OBJECTIVE: The objective of our study was to assess and compare the performance of five LLMs on multiple-choice questions (MCQs) in cardiology. MATERIALS AND METHODS: This was a comparative study conducted in the cardiology department of the Bogodogo University Hospital, Ouagadougou, involving 83 MCQs derived from the 2020 French national cardiology curriculum. The questions were submitted to ChatGPT-4, Claude, Gemini, Mistral, and Perplexity. Performance was evaluated based on overall and thematic accuracy, as well as the number of discordances. Agreement between the LLMs was assessed using the Kruskal-Wallis test. RESULTS: Claude achieved the highest overall accuracy (78.31%), followed by ChatGPT-4 and Gemini (75.90%), then Mistral (72.29%) and Perplexity (68.67%). Each LLM demonstrated a distinct performance profile by topic, with Claude excelling in heart failure (100%) and arrhythmias (90.9%), and ChatGPT-4 in diagnostic investigations (87.5%). The analysis of discordances showed a slightly higher precision for ChatGPT-4. The Kruskal-Wallis test with effect size revealed statistically significant differences in performance between the LLMs, whether globally, by topic (p < 0.05) and with generally large effect sizes. CONCLUSION: Despite variations in their performance profiles, these five LLMs studied have relatively similar capabilities for answering well-structured cardiology multiple-choice questions. They could therefore be valuable tools in medical education in our resource-limited context.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.