Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT Performance on 120 Interdisciplinary Allergology Questions—Systematic Evaluation With Clinical Error Impact Assessment for Critical Erroneous AI-Guided Chatbot Advice
4
Zitationen
9
Autoren
2025
Jahr
Abstract
BACKGROUND: ChatGPT (Chatbot with Generative Pretrained Transformer), despite not being a medical device, may be used by patients for medical inquiries. Its accessibility and convenience, particularly amidst long waiting times for allergology appointments, make it an attractive but potentially erroneous source of advice. OBJECTIVES: This study evaluates ChatGPT's performance on allergological questions from clinical practice, offering a systematic approach to rating its errors. An Allergological Error Impact Assessment is proposed to analyze the potential consequences of these errors on patients. METHODS: A total of 120 multidisciplinary allergology questions from dermatology, pediatrics, and pulmonology were prompted to ChatGPT (3.5). Errors were assessed in terms of content, accuracy (ACC), completeness (CO), perceived humanness (PHU), and readability (Flesch Reading Ease). Erroneous responses were categorized on a 3-step severity scale (minor, major, and critical). Critical errors underwent allergological error impact analysis. Statistical evaluation included descriptive analyses and Kruskal-Wallis and Mann-Whitney U tests. RESULTS: ChatGPT demonstrated good accuracy (mean ACC 4.1/5, standard deviation: 0.78, range: 1-5). CO and PHU were sufficient but lowest for pediatric queries. Readability was at an academic level for most responses. Six critical errors were identified: 1 in dermatology, 2 in pediatrics, and 3 in pulmonology. Notably, a critical pediatric food allergen error carried a potentially life-threatening risk. CONCLUSION: ChatGPT's imperfect reliability in allergology highlights the need for expert counseling in specialized fields. Tailoring these tools to allergy use cases could improve utility of models like ChatGPT for clinical applications, such as answering questions from allergological routine care.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.