Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating large language models for interpreting and applying caries guidelines in clinical decision support
0
Zitationen
8
Autoren
2026
Jahr
Abstract
To systematically evaluate the capability of three large language models (LLMs)—GPT-4o, Grok-3, and DeepSeek—in interpreting and translating clinical practice guidelines for caries management and in supporting clinical decision-making, thereby exploring their potential role in disseminating dental knowledge and assisting clinical practice. Based on the American Dental Association’s guideline on nonrestorative treatments for carious lesions, a zero-shot prompting strategy—where models are given tasks without prior examples or fine-tuning—was employed to objectively assess their intrinsic ability to interpret and apply guideline knowledge. Each model was instructed to generate guideline summaries tailored for both healthcare professionals and the general public, as well as to provide diagnoses and treatment plans for three standardized clinical cases. All outputs were independently evaluated by three experienced endodontists across five dimensions (accuracy, clarity, conciseness, logical coherence, and overall quality) using a 0–10 scale. Automated text consistency was also assessed using ROUGE-L and BLEU metrics. In clinical case analyses, GPT-4o achieved a significantly higher composite score compared to DeepSeek and Grok-3, demonstrating superior diagnostic accuracy and clinically relevant treatment recommendations. This performance gap may be attributed to GPT-4o’s more robust clinical reasoning architecture and better alignment with guideline logic. For public-facing summaries, GPT-4o excelled in clarity and coherence, while DeepSeek achieved the highest accuracy in preserving source terminology. For professional summaries, all models performed comparably, with DeepSeek leading in automated metrics. Large language models demonstrate promising potential in translating dental guidelines and assisting clinical decision-making, with GPT-4o showing particularly strong performance in complex clinical reasoning tasks. However, limitations include variability in personalized recommendation generation and occasional mechanistic application of guidelines. The findings underscore the need for enhanced model interpretability, integration of multimodal data, and development of human-AI collaborative frameworks to optimize the balance between standardized and personalized oral disease management.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.