Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-based diagnostic evaluation of GPT-4o for crown-fracture detection on maxillary periapical radiographs: effects of prompt detail and customization
0
Zitationen
3
Autoren
2026
Jahr
Abstract
BACKGROUND/AIM: Artificial intelligence (AI) and large language models (LLMs) are rapidly entering dental imaging workflows. We conducted a diagnostic evaluation of GPT-4o for crown-fracture detection on periapical radiographs and examined how prompt detail and customization (prompt-based; no fine-tuning) affect performance in a positives-only dataset. METHODS: In this single-center, retrospective study, 90 different anonymized maxillary periapical radiographs with at least one crown fracture were evaluated by standard GPT-4o (GPT-4o) and customized GPT-4o (CGPT-4o). Both variants were accessed via a commercial interface (no API parameter control). Customization was achieved via a custom GPT with task instructions and in-context examples; no model parameter fine-tuning was performed. Two different prompts were used: a detailed prompt (DP) and a short prompt (SP). The performance of four different test groups (GPT-4o + DP, GPT-4o + SP, CGPT-4o + DP, CGPT-4o + SP) in detecting crown fractures from periapical radiographs was evaluated. Each group evaluated 90 radiographs in 5 independent runs, yielding a total of 1800 responses. Outputs were scored on an ordinal rubric (0 = incorrect, 1 = partially correct, 2 = correct) by three pediatric dentists. The reference standard was the blinded, independent assessment of these experts with consensus. A proportional-odds mixed model assessed the main and interaction effects of Model and Prompt on the odds of higher ordinal correctness, with random intercepts for radiograph (and runs) and adjustment for fracture grade (G1-G3). RESULTS: The analysis revealed that both the main and interaction effects of models and prompts were statistically significant. Specifically, CGPT-4o generated higher odds of ordinal correctness than GPT-4o, and detailed prompts were associated with higher odds of ordinal correctness compared to short prompts. There was a significant Model×Prompt interaction, indicating that correctness depended on the specific model-prompt pairing. Among the four combinations, GPT-4o, with short prompts, exhibited the lowest odds of correctness, whereas no statistically significant differences were observed among the remaining three combinations. CONCLUSIONS: The crown fracture detection performance of GPT-4o was significantly affected by prompt design and customization. Especially for short prompts, customization improved detection performance considerably, and using detailed prompts with the standard GPT-4o improved ordinal correctness. The findings demonstrate the critical importance of task-oriented configuration and prompt engineering in the clinical application of AI-based language models in dental traumatology. The dataset comprised only positive cases from a single center and was limited to the maxillary anterior region. Accordingly, we used an ordinal (0-1-2) localization outcome; specificity and ROC-AUC could not be estimated, and external validity (generalizability) is limited.
Ähnliche Arbeiten
The long-term efficacy of currently used dental implants: a review and proposed criteria of success.
1986 · 3.692 Zit.
The Gingival Index, the Plaque Index and the Retention Index Systems
1967 · 3.663 Zit.
The burden of oral disease: challenges to improving oral health in the 21st century.
2005 · 3.579 Zit.
Staging and grading of periodontitis: Framework and proposal of a new classification and case definition
2018 · 3.116 Zit.
Periodontitis: Consensus report of workgroup 2 of the 2017 World Workshop on the Classification of Periodontal and Peri‐Implant Diseases and Conditions
2018 · 3.107 Zit.