OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 22:36

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multi-Dimensional Evaluation Of Large Language Models In Dental Implantology

2025·0 Zitationen·International Dental JournalOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Large language models (LLMs) show promise in medicine, but their effectiveness in specialized fields like implant dentistry remains unclear and this study systematically evaluates five LLMs (ChatGPT, DeepSeek, Grok, Gemini, and Qwen) in clinical implantology scenarios to guide precise application. A comprehensive multi-dimensional evaluation used a test set of 40 professional questions and 5 complex cases across eight key themes. The responses of the five LLMS were scored by three senior experts from five dimensions in two rounds of double-blind. Inter-rater reliability was tested, followed by statistical analyses including Spearman's ρ test, Friedman test, mixed effect model, and principal component analysis (PCA). High inter-rater reliability was confirmed, and Gemini outperformed all other models in both question answering and case analysis, significantly exceeding ChatGPT and DeepSeek in question responses(p < 0.001) and Qwen in case evaluations(p < 0.01). Mixed-effects models further showed Gemini superiority over ChatGPT (Estimate = 1.7, p < 0.001), while Qwen exhibited a decline in performance(Estimate = -1, p = 0.040). DeepSeek-R1 also showed positive interaction effects in specific themes. Importantly, the PCA results not only showed the performance differences of each LLM in different clinical scenarios, but also revealed the deep connections among the dimensions. PCA revealed not only performance disparities among LLMs but deeper correlations across evaluation dimensions. This study reveals diverse LLM differentiated capabilities in dental implantology, recommending context-specific model selection to different clinical scenario, as Gemini demonstrates optimal performance, notably for high-level clinical support.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingDental Radiography and Imaging
Volltext beim Verlag öffnen