Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI in Patient Care: Evaluating Large Language Model Performance Against Evidence-Based Guidelines for Pulmonary Embolism
0
Zitationen
6
Autoren
2026
Jahr
Abstract
OBJECTIVE: Artificial intelligence (AI)-driven large language models (LLMs) are increasingly used in patient education; however, their ability to interpret and apply clinical guidelines within real-world physician workflows remains uncertain. Pulmonary embolism (PE), with its well-established diagnostic and management protocols, provides a suitable model for evaluating these systems. This study assessed the performance of four widely used AI-driven LLMs-ChatGPT-4o, DeepSeek-V2, Gemini, and Grok-in applying the 2019 European Society of Cardiology guidelines for PE. The focus was on evaluating clinical accuracy, adherence to guidelines, and response consistency. MATERIAL AND METHODS: Ten open-ended questions based on a simulated PE case were created, covering diagnosis, risk stratification, treatment, and follow-up. Guideline-based reference answers were used for scoring. LLMs were tested under identical conditions, and the responses were anonymized and scored by two emergency physicians using a 10-point scale. Inter-rater reliability was measured using the intraclass correlation coefficient (ICC), and group comparisons were made using Kruskal-Wallis tests. RESULTS: = 0.390). Performance varied by category; ChatGPT-4o excelled in follow-up, while DeepSeek-V2 performed best in diagnostics. Expert reviewers noted ChatGPT-4o's structured responses and Grok's practicality, but highlighted limitations such as insufficient personalization and guideline gaps. Inter-rater agreement was excellent (ICC: 0.986). CONCLUSION: AI-driven LLMs show promise in supporting PE management, though none consistently excel in all domains. Further development is needed to enhance clinical integration and guideline compliance.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.