Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Empowering front-line physicians with AI: Evaluating large language models in everyday ENT care
4
Zitationen
11
Autoren
2026
Jahr
Abstract
PURPOSE: Artificial intelligence systems known as large language models are being evaluated for clinical decision support, yet their role in emergency and primary care remains limited. Physicians in these settings often encounter ear, nose, and throat conditions where diagnostic uncertainty, unnecessary testing, and inappropriate referrals contribute to patient risk and healthcare inefficiency. This study compared the performance of advanced large language models with physicians in diagnosis, management, and referral across common and high-acuity otolaryngologic scenarios. METHODS: Twelve clinical vignettes representing routine and urgent presentations were developed and validated by otolaryngologists. One hundred practicing physicians in family medicine and emergency medicine, including residents and attending physicians, completed all vignettes by providing a diagnosis, management plan, and referral decision. Four large language models (Gemini-2.0, ChatGPT-4.0, ChatGPT-5, and OpenEvidence) were tested using identical prompts. Model outputs were anonymized, randomized, and rated by a blinded expert panel using the Quality Analysis of Medical Artificial Intelligence tool, which assesses accuracy, clarity, completeness, sourcing, relevance, and usefulness. RESULTS: Physicians achieved mean diagnostic accuracy of 91.6% and management accuracy of 87.9%. In non-urgent cases, 30.4% of responses represented inappropriate referral. Only half recognized the need for urgent referral in a cerebrospinal fluid leak scenario. Large language models demonstrated comparable diagnostic and management accuracy with higher referral appropriateness. CONCLUSIONS: Large language models showed consistent, guideline-concordant reasoning in simulated emergency and primary-care otolaryngology cases. Their potential lies in supporting, not replacing, clinical judgment through responsible integration and real-world validation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.