Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of large language models as decision support tools for head and neck cancer management: A blinded multidisciplinary simulation study
1
Zitationen
12
Autoren
2026
Jahr
Abstract
• LLMs generated guideline-concordant management recommendations for complex head and neck cancer scenarios. • Retrieval augmentation was associated with higher ratings for appropriateness and feasibility in this blinded simulation. • Eight blinded surgeons rated outputs across appropriateness, clarity, specificity, and feasibility domains. • RAG-enabled models demonstrated consistent performance across repeated generations. • Findings support LLMs as adjunctive decision-support and educational tools requiring expert oversight. The management of head and neck cancer relies on multidisciplinary expertise; however, access to tumor boards remains variable. Large language models (LLMs) may support guideline-based decision-making, although performance in complex oncologic scenarios is not well defined. Fourteen synthetic cases based on real tumor board encounters were evaluated. Five blinded comparator arms produced recommendations: a human expert, Non-RAG-GPT-4, Non-RAG-GPT-5, RAG-GPT-4, and RAG-GPT-5. Eight head and neck oncologic surgeons scored each recommendation for appropriateness, clarity, specificity, and feasibility using 5-point Likert scales. Paired permutation testing and inter-rater reliability were assessed. LLM outputs showed close alignment with expert recommendations. RAG-based models achieved the highest mean scores across domains, with some statistically significant differences versus the expert comparator in appropriateness and clarity; however, absolute differences were modest. Inter-rater reliability was strong (ICC 0.73–0.87). Advanced LLMs can generate guideline-concordant management recommendations in simulated head and neck cancer cases, supporting potential utility for decision support and education; prospective validation and expert oversight remain essential.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Sheba Medical Center(IL)
- The University of Texas Health Science Center(US)
- Università degli Studi di Enna Kore(IT)
- Thomas Jefferson University Hospital(US)
- Sidney Kimmel Cancer Center(US)
- Ospedale San Paolo(IT)
- Fondazione IRCCS Istituto Nazionale dei Tumori(IT)
- Azienda Socio Sanitaria Territoriale Grande Ospedale Metropolitano Niguarda(IT)
- Complexo Hospitalario Universitario A Coruña(ES)