Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Clinical feasibility of AI Doctors: Evaluating the replacement potential of large language models in outpatient settings for central nervous system tumors
11
Zitationen
6
Autoren
2025
Jahr
Abstract
BACKGROUND AND OBJECTIVES: The treatment of central nervous system (CNS) tumors is complex and resource-intensive, with higher mortality in underserved regions. Large language models (LLMs) show promise in medical support, but their real-world performance in CNS tumor outpatient care remains unclear. This study aims to assess the diagnostic and treatment capabilities of LLMs in bilingual clinical settings. METHODS: This retrospective study evaluated three LLMs (ChatGPT-4o, DeepSeek-R1, and Doubao) in assisting neuro-oncology outpatient decision-making within bilingual (Chinese/English) clinical environments. A total of 338 outpatient cases were included, with each model assigned three clinical tasks: differential diagnosis, main diagnosis, and treatment advice. Model outputs were compared against assessments by experienced neurosurgeons. Statistical analysis employed McNemar tests (P < 0.05). RESULTS: ChatGPT-4o and DeepSeek-R1 achieved over 90 % accuracy in differential diagnosis, showing no significant difference compared to doctors (P > 0.05), while Doubao performed significantly worse (Chinese: P = 0.02, English: P = 0.01). In main diagnosis, both ChatGPT-4o and DeepSeek-R1 showed no significant deviation from doctors performance (P > 0.05), whereas Doubao underperformed (Chinese: P = 0.019, English: P = 0.011). For treatment recommendations, all models showed reduced accuracy (ChatGPT-4o: 80.5 %; DeepSeek-R1: 79 %; Doubao: 71.3 %), significantly lower than doctors (Whether in Chinese or English: P < 0.05). No performance difference was observed between Chinese and English cases. CONCLUSION: LLMs show strong potential in the preliminary diagnosis and decision support for CNS tumors, and their cross-lingual adaptability underscores their clinical feasibility.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.