OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.05.2026, 14:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Empowering front-line physicians with AI: Evaluating large language models in everyday ENT care

2026·4 Zitationen·The American Journal of Emergency MedicineOpen Access
Volltext beim Verlag öffnen

4

Zitationen

11

Autoren

2026

Jahr

Abstract

PURPOSE: Artificial intelligence systems known as large language models are being evaluated for clinical decision support, yet their role in emergency and primary care remains limited. Physicians in these settings often encounter ear, nose, and throat conditions where diagnostic uncertainty, unnecessary testing, and inappropriate referrals contribute to patient risk and healthcare inefficiency. This study compared the performance of advanced large language models with physicians in diagnosis, management, and referral across common and high-acuity otolaryngologic scenarios. METHODS: Twelve clinical vignettes representing routine and urgent presentations were developed and validated by otolaryngologists. One hundred practicing physicians in family medicine and emergency medicine, including residents and attending physicians, completed all vignettes by providing a diagnosis, management plan, and referral decision. Four large language models (Gemini-2.0, ChatGPT-4.0, ChatGPT-5, and OpenEvidence) were tested using identical prompts. Model outputs were anonymized, randomized, and rated by a blinded expert panel using the Quality Analysis of Medical Artificial Intelligence tool, which assesses accuracy, clarity, completeness, sourcing, relevance, and usefulness. RESULTS: Physicians achieved mean diagnostic accuracy of 91.6% and management accuracy of 87.9%. In non-urgent cases, 30.4% of responses represented inappropriate referral. Only half recognized the need for urgent referral in a cerebrospinal fluid leak scenario. Large language models demonstrated comparable diagnostic and management accuracy with higher referral appropriateness. CONCLUSIONS: Large language models showed consistent, guideline-concordant reasoning in simulated emergency and primary-care otolaryngology cases. Their potential lies in supporting, not replacing, clinical judgment through responsible integration and real-world validation.

Ähnliche Arbeiten