Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking multimodal large language models on the dental licensing examination: Challenges with clinical image interpretation

2025·12 Zitationen·Journal of Dental SciencesOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background: /purpose: Large language models (LLMs) have been studied in text-based healthcare tasks, but their performance in multimodal dental applications has not yet been fully explored. This study evaluated the performance of four multimodal LLMs on dental licensing examination questions with both text-only and visually-based components. Materials and methods: Four multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), Claude 3.5 Sonnet (Sonnet), and Gemini 2.0 Flash Thinking Experimental (Gemini), were tested on 353 questions from the 2024 Japanese National Dental Examination, including 204 text-only and 149 visually-based questions spanning 17 dental specialties. A zero-shot approach was used without prompt engineering. Performance was analyzed using Cochran's Q test and McNemar's test with Bonferroni correction. Results: o1 achieved the highest overall correct response rate (81.9 %), followed by Sonnet (71.7 %), Gemini (66.6 %), and 4o (65.7 %). All models performed significantly better on text-only questions (79.9-92.2 %) than on visually-based questions (45.6-67.8 %). Performance varied by specialty, with highest scores in basic medical sciences (Dental pharmacology: 100 %; Oral physiology: 86.7-100 %) and lower scores in clinical specialties requiring visual interpretation (Orthodontics: 36.4-66.7 %). Conclusion: Multimodal LLMs demonstrate promising performance on dental examination questions, particularly in text-based scenarios, but significant challenges remain in complex visual interpretation. The remarkable zero-shot performance of newer models such as o1 suggests potential applications in dental education and certain aspects of clinical decision support, although further advances are needed before reliable application in visually complex diagnostic workflows.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDental Research and COVID-19Radiology practices and education

Volltext beim Verlag öffnen

Benchmarking multimodal large language models on the dental licensing examination: Challenges with clinical image interpretation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen