OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 14:05

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multi‐model Artificial Intelligence Evaluation in Sudden Sensorineural Hearing Loss

2026·2 Zitationen·OtolaryngologyOpen Access
Volltext beim Verlag öffnen

2

Zitationen

4

Autoren

2026

Jahr

Abstract

Abstract Objective To compare the diagnostic accuracy, linguistic clarity, and user satisfaction of three large language models (ChatGPT‐4.0, Claude 3.7 Sonet, and OpenAI Mini 3) in managing sudden sensorineural hearing loss. Study Design Prospective, multi‐domain comparative analysis using blinded expert evaluation. Setting Online artificial intelligence (AI) platforms accessed under standardized conditions. Methods Twenty‐seven sudden sensorineural hearing loss‐related questions—covering general knowledge, audiometric interpretation, and clinical case scenarios—were submitted to the three AI models. Responses were evaluated by 10 board‐certified otolaryngologists using three validated tools: Quality Assessment of Medical Artificial Intelligence (QAMAI), Artificial Intelligence Performance Instrument (AIPI), and Artificial Intelligence Satisfaction and Performance Evaluation Questionnaire (AISPE‐Q). Linguistic complexity was assessed using metrics such as word count, sentence length, lexical diversity, and clinical verb use. Results ChatGPT‐4.0 demonstrated the highest scores in clinical accuracy (QAMAI: 4.57), completeness (4.53), and evaluator satisfaction (AISPE‐Q: 94%). Claude 3.7 outperformed in clarity and sentence complexity, while OpenAI Mini 3 exhibited the highest lexical diversity and directive tone but scored lower overall. Inter‐rater reliability was strong (intraclass correlation coefficient [ICC] > 0.85). Correlation analysis revealed a significant relationship between objective quality and subjective satisfaction ( r > 0.76). Conclusion ChatGPT‐4.0 delivered the most clinically aligned and satisfactory responses, whereas Claude 3.7 provided linguistically refined outputs. Our findings support the context‐specific application of hybrid large language model approaches in otolaryngology, particularly for patient education, diagnosis, and AI‐driven triage. Level of Evidence 2—prospective comparative diagnostic accuracy study.

Ähnliche Arbeiten