Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Accuracy and Completeness of AI-Generated Dental Responses: An Evaluation of the Chat-GPT Model
3
Zitationen
6
Autoren
2025
Jahr
Abstract
Background: The rapid advancement of artificial intelligence (AI) in healthcare has opened new opportunities, yet the clinical validation of AI tools in dentistry remains limited. Objectives: This study aimed to assess the performance of ChatGPT in generating accurate and complete responses to academic dental questions across multiple specialties, comparing the capabilities of GPT-4 and GPT-3.5 models. Methodology: A panel of academic specialists from eight dental specialties collaboratively developed 48 clinical questions, classified by consensus as easy, medium, or hard, and as requiring either binary (yes/no) or descriptive responses. Each question was sequentially entered into both GPT-4 and GPT-3.5 models, with instructions to provide guideline-based answers. The AI-generated responses were independently evaluated by the specialists for accuracy (6-point Likert scale) and completeness (3-point Likert scale). Descriptive and inferential statistics were applied, including Mann–Whitney U and Kruskal–Wallis tests, with significance set at p < 0.05. Results: GPT-4 consistently outperformed GPT-3.5 in both evaluation domains. The median accuracy score was 6.0 for GPT-4 and 5.0 for GPT-3.5 (p = 0.02), while the median completeness score was 3.0 for GPT-4 and 2.0 for GPT-3.5 (p < 0.001). GPT-4 demonstrated significantly higher overall accuracy (5.29 ± 1.1) and completeness (2.44 ± 0.71) compared to GPT-3.5 (4.5 ± 1.7 and 1.69 ± 0.62, respectively; p = 0.024 and <0.001). When stratified by specialty, notable improvements with GPT-4 were observed in Periodontology, Endodontics, Implantology, and Oral Surgery, particularly in completeness scores. Conclusions: In academic dental settings, GPT-4 provided more accurate and complete responses than GPT-3.5. Despite both models showing potential, their clinical application should remain supervised by human experts.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.