Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Tests of large language models' medical competence and application for clinical decision support of musculoskeletal rehabilitation
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Objective: Large language models (LLMs) are currently abundant and diverse, yet clinicians lack clarity on top performers, with uncertainty about general LLMs' expertise in musculoskeletal rehabilitation. This study aims to investigate the potential and correctness of LLMs in clinical application, and to evaluate whether LLMs could assist primary rehabilitation therapists to prepare for rehabilitation examination. Method: 8 primary doctors and therapists tested 10 LLMs in the first test, 5 senior doctors and therapists assessed answers in the second test, and 5 primary therapists acted as examinees in the third test. We assessed the quality of case analysis based on six different dimensions, including Case Understanding, Clinical Reasoning, Primary Diagnosis, Differential Diagnosis, Treatment Plan Accuracy and Safety, and Guidelines & Consensus. Results: < 0.001). In the second test, Doubao 1.5 pro achieved relatively high scores in both cases, and LLMs gained high scores in "Case understanding", "Clinical Reasoning" and "Diagnosis". In the third test, primary therapists achieving a mean accuracy rate of 76.9%, and Doubao 1.5 pro improved its accuracy rates to 85.8%. Conclusions: Doubao 1.5 pro possessed competent ability and application prospects, and was assessed as the best LLM for answering musculoskeletal rehabilitation questions. We also demonstrated that the response quality of local-language LLMs was significantly better than that of English LLMs in answering localized language questions.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.663 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.576 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.091 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.859 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.