Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Real-world feasibility of generative large language models for clinical decision support in benign prostatic hyperplasia
0
Zitationen
10
Autoren
2025
Jahr
Abstract
BACKGROUND: Benign prostatic hyperplasia (BPH) is a common condition among middle-aged and elderly men, often accompanied by lower urinary tract symptoms that significantly impact patients' quality of life. It has emerged as a major public health challenge worldwide. In recent years, artificial intelligence (AI), particularly large language models (LLMs), has shown great potential in supporting clinical decision-making. This study aimed to systematically evaluate the level of clinical knowledge mastery demonstrated by mainstream generative LLMs in the context of BPH and to further explore their decision-support capabilities in real-world clinical scenarios. METHODS: We assessed the clinical knowledge and decision-making capabilities of ChatGPT o1 and DeepSeek R1 in the field of BPH. A set of 30 clinically relevant questions was developed and submitted to both AI models. For comparison, clinical physicians from Chinese medical institutions answered the same questions under closed-book conditions. Additionally, six real-world BPH cases were used to simulate clinical diagnostic and treatment scenarios to further evaluate the performance of the two AI models in clinical decision-making. RESULTS: In the clinical knowledge assessment, both ChatGPT o1 and DeepSeek R1 outperformed the physician group, with no significant difference between the two models. Subgroup analyses revealed performance differences based on clinical knowledge categories and physician experience levels. DeepSeek R1 and ChatGPT o1 generally outperformed resident physicians and performed comparably to attending physicians. In terms of clinical decision support, DeepSeek R1 outperformed ChatGPT o1 in both accuracy of medical knowledge and logical coherence. CONCLUSION: Studies have shown that in the domain of clinical knowledge related to BPH, ChatGPT o1 and DeepSeek R1 perform at levels approaching those of attending physicians. Both models demonstrate substantial potential in supporting clinical decision-making, with DeepSeek R1 performing particularly well. Despite existing limitations, the application of AI in healthcare holds considerable promise and is highly anticipated.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.789 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.555 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.989 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.