OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.04.2026, 21:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating large language models for interpreting and applying caries guidelines in clinical decision support

2026·0 Zitationen·BMC Oral HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

To systematically evaluate the capability of three large language models (LLMs)—GPT-4o, Grok-3, and DeepSeek—in interpreting and translating clinical practice guidelines for caries management and in supporting clinical decision-making, thereby exploring their potential role in disseminating dental knowledge and assisting clinical practice. Based on the American Dental Association’s guideline on nonrestorative treatments for carious lesions, a zero-shot prompting strategy—where models are given tasks without prior examples or fine-tuning—was employed to objectively assess their intrinsic ability to interpret and apply guideline knowledge. Each model was instructed to generate guideline summaries tailored for both healthcare professionals and the general public, as well as to provide diagnoses and treatment plans for three standardized clinical cases. All outputs were independently evaluated by three experienced endodontists across five dimensions (accuracy, clarity, conciseness, logical coherence, and overall quality) using a 0–10 scale. Automated text consistency was also assessed using ROUGE-L and BLEU metrics. In clinical case analyses, GPT-4o achieved a significantly higher composite score compared to DeepSeek and Grok-3, demonstrating superior diagnostic accuracy and clinically relevant treatment recommendations. This performance gap may be attributed to GPT-4o’s more robust clinical reasoning architecture and better alignment with guideline logic. For public-facing summaries, GPT-4o excelled in clarity and coherence, while DeepSeek achieved the highest accuracy in preserving source terminology. For professional summaries, all models performed comparably, with DeepSeek leading in automated metrics. Large language models demonstrate promising potential in translating dental guidelines and assisting clinical decision-making, with GPT-4o showing particularly strong performance in complex clinical reasoning tasks. However, limitations include variability in personalized recommendation generation and occasional mechanistic application of guidelines. The findings underscore the need for enhanced model interpretability, integration of multimodal data, and development of human-AI collaborative frameworks to optimize the balance between standardized and personalized oral disease management.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationElectronic Health Records SystemsClinical practice guidelines implementation
Volltext beim Verlag öffnen