Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Analyzing the performance of multimodal large language models on visually-based questions in the Japanese National Examination for Dental Technicians
7
Zitationen
10
Autoren
2025
Jahr
Abstract
Background/purpose: Large language models (LLMs) offer promising applications in dentistry, but their performance in specialized, image-rich contexts such as dental technology examinations remains uncertain. The purpose of this study was to evaluate the accuracy of three multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), and Claude 3.5 Sonnet (Sonnet), when presented with questions from the Japanese National Examination for Dental Technicians. Materials and methods: A total of 240 multiple-choice questions from 2022 to 2024 theory sections of the exam were used. Each question, including its accompanying figures or images where applicable, was presented to the three LLMs in a zero-shot manner without specialized prompting. Correct response rates were calculated overall, as well as by question type (text-only vs. visually-based) and subject area. Statistical comparisons were performed using Cochran's Q test, followed by McNemar's test with Bonferroni correction where indicated. Results: = 0.017). In contrast, all models showed reduced accuracy on visually-based questions (44.6-55.4 %), with no significant difference among them. Conclusion: These results suggest that multimodal LLMs can supplement theoretical dental technology education, although their limited performance on visual tasks indicates the need for traditional hands-on training. Enhanced image interpretation skills may help address workforce challenges in dental technology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.