Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI-based diagnostic evaluation of GPT-4o for crown-fracture detection on maxillary periapical radiographs: effects of prompt detail and customization

2026·0 Zitationen·BMC Oral HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND/AIM: Artificial intelligence (AI) and large language models (LLMs) are rapidly entering dental imaging workflows. We conducted a diagnostic evaluation of GPT-4o for crown-fracture detection on periapical radiographs and examined how prompt detail and customization (prompt-based; no fine-tuning) affect performance in a positives-only dataset. METHODS: In this single-center, retrospective study, 90 different anonymized maxillary periapical radiographs with at least one crown fracture were evaluated by standard GPT-4o (GPT-4o) and customized GPT-4o (CGPT-4o). Both variants were accessed via a commercial interface (no API parameter control). Customization was achieved via a custom GPT with task instructions and in-context examples; no model parameter fine-tuning was performed. Two different prompts were used: a detailed prompt (DP) and a short prompt (SP). The performance of four different test groups (GPT-4o + DP, GPT-4o + SP, CGPT-4o + DP, CGPT-4o + SP) in detecting crown fractures from periapical radiographs was evaluated. Each group evaluated 90 radiographs in 5 independent runs, yielding a total of 1800 responses. Outputs were scored on an ordinal rubric (0 = incorrect, 1 = partially correct, 2 = correct) by three pediatric dentists. The reference standard was the blinded, independent assessment of these experts with consensus. A proportional-odds mixed model assessed the main and interaction effects of Model and Prompt on the odds of higher ordinal correctness, with random intercepts for radiograph (and runs) and adjustment for fracture grade (G1-G3). RESULTS: The analysis revealed that both the main and interaction effects of models and prompts were statistically significant. Specifically, CGPT-4o generated higher odds of ordinal correctness than GPT-4o, and detailed prompts were associated with higher odds of ordinal correctness compared to short prompts. There was a significant Model×Prompt interaction, indicating that correctness depended on the specific model-prompt pairing. Among the four combinations, GPT-4o, with short prompts, exhibited the lowest odds of correctness, whereas no statistically significant differences were observed among the remaining three combinations. CONCLUSIONS: The crown fracture detection performance of GPT-4o was significantly affected by prompt design and customization. Especially for short prompts, customization improved detection performance considerably, and using detailed prompts with the standard GPT-4o improved ordinal correctness. The findings demonstrate the critical importance of task-oriented configuration and prompt engineering in the clinical application of AI-based language models in dental traumatology. The dataset comprised only positive cases from a single center and was limited to the maxillary anterior region. Accordingly, we used an ordinal (0-1-2) localization outcome; specificity and ROC-AUC could not be estimated, and external validity (generalizability) is limited.

Autoren

Institutionen

Themen

Dental Radiography and ImagingArtificial Intelligence in Healthcare and EducationDental Research and COVID-19

Volltext beim Verlag öffnen

AI-based diagnostic evaluation of GPT-4o for crown-fracture detection on maxillary periapical radiographs: effects of prompt detail and customization

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen