Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The role of artificial intelligence in medical education: an evaluation of Large Language Models (LLMs) on the Turkish Medical Specialty Training Entrance Exam
4
Zitationen
3
Autoren
2025
Jahr
Abstract
OBJECTIVE: To evaluate the performance of advanced large language models (LLMs)-OpenAI-ChatGPT 4, Google AI-Gemini 1.5 Pro, Cohere-Command R + and Meta AI-Llama 3 70B on questions from the Turkish Medical Specialty Training Entrance Exam (2021, 1st semester) and analyze their answers for user interpretability in languages other than English. METHODS: The study used questions from the Basic Medical Sciences and Clinical Medical Sciences exams of the Turkish Medical Specialty Training Entrance Exam held on March 21, 2021. The 240 questions were presented to the LLMs in Turkish, and their responses were evaluated based on the official answers published by the Student Selection and Placement Centre. RESULTS: ChatGPT 4 was the best-performing model with an overall accuracy of 88.75%. Llama 3 70B followed closely with 79.17% accuracy. Gemini 1.5 Pro achieved 78.13% accuracy, while Command R + lagged with 50% accuracy. ChatGPT 4 demonstrated strengths in both basic and clinical medical science questions. Performance varied across question difficulties, with ChatGPT 4 maintaining high accuracy even on the most challenging questions. CONCLUSIONS: GPT-4 and Llama 3 70B achieved satisfactory results on the Turkish Medical Specialty Training Entrance Exam, demonstrating their potential as safe sources for basic medical sciences and clinical medical sciences knowledge in languages other than English. These LLMs could be valuable resources for medical education and clinical support in non-English speaking areas. However, Gemini 1.5 Pro and Command R + show potential but need significant improvement to compete with the best-performing models.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.