OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 22.05.2026, 12:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT Performance on 120 Interdisciplinary Allergology Questions—Systematic Evaluation With Clinical Error Impact Assessment for Critical Erroneous AI-Guided Chatbot Advice

2025·4 Zitationen·The Journal of Allergy and Clinical Immunology In PracticeOpen Access
Volltext beim Verlag öffnen

4

Zitationen

9

Autoren

2025

Jahr

Abstract

BACKGROUND: ChatGPT (Chatbot with Generative Pretrained Transformer), despite not being a medical device, may be used by patients for medical inquiries. Its accessibility and convenience, particularly amidst long waiting times for allergology appointments, make it an attractive but potentially erroneous source of advice. OBJECTIVES: This study evaluates ChatGPT's performance on allergological questions from clinical practice, offering a systematic approach to rating its errors. An Allergological Error Impact Assessment is proposed to analyze the potential consequences of these errors on patients. METHODS: A total of 120 multidisciplinary allergology questions from dermatology, pediatrics, and pulmonology were prompted to ChatGPT (3.5). Errors were assessed in terms of content, accuracy (ACC), completeness (CO), perceived humanness (PHU), and readability (Flesch Reading Ease). Erroneous responses were categorized on a 3-step severity scale (minor, major, and critical). Critical errors underwent allergological error impact analysis. Statistical evaluation included descriptive analyses and Kruskal-Wallis and Mann-Whitney U tests. RESULTS: ChatGPT demonstrated good accuracy (mean ACC 4.1/5, standard deviation: 0.78, range: 1-5). CO and PHU were sufficient but lowest for pediatric queries. Readability was at an academic level for most responses. Six critical errors were identified: 1 in dermatology, 2 in pediatrics, and 3 in pulmonology. Notably, a critical pediatric food allergen error carried a potentially life-threatening risk. CONCLUSION: ChatGPT's imperfect reliability in allergology highlights the need for expert counseling in specialized fields. Tailoring these tools to allergy use cases could improve utility of models like ChatGPT for clinical applications, such as answering questions from allergological routine care.

Ähnliche Arbeiten