Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models in Interventional Cardiology: The ILLUMINATE Blinded Model-Comparison Study
0
Zitationen
14
Autoren
2025
Jahr
Abstract
OBJECTIVES: Large language models (LLMs) have the potential to assist in complex decision making for interventional cardiology (IC). However, their comparative performance in providing clinical recommendations remains uncertain. In this blinded model‑comparison study, the authors evaluated and compared the quality of recommendations produced by 6 LLMs for complex IC cases. METHODS: Twenty detailed and complex clinical cases focusing on coronary artery disease (n=10) and structural heart disease (n=10) were developed. Six LLMs were tested: default ChatGPT (ChatGPTd), ChatGPT with European Society of Cardiology guidelines (ChatGPT-gl), ChatGPT with internet search enabled (ChatGPTi), Gemini (Google), Mistral 7B (Mistral AI), and Perplexity AI (Perplexity AI, Inc.). Only the ordering of anonymized outputs was randomized to ensure blinding. Five expert ICs independently assessed the anonymized and randomized responses using a 0 to 10 scale for appropriateness, accuracy, relevance, clarity, and clinical utility, generating a composite score. Statistical analysis was performed using a mixed linear model. RESULTS: Six hundred blinded evaluations (20 cases x 6 models x 5 raters) were analyzed, yielding an overall composite score of 7.1 (95% CI, 7.0-7.2). Performance significantly varied across LLMs (P less than .001), with ChatGPTi (7.8 [7.5-8.0]) and ChatGPT-gl (7.7 [7.4-7.9]) outperforming others. ChatGPTd (6.9 [6.6-7.3]), Mistral 7B (7.0 [6.7-7.3]), and Perplexity AI (7.0 [6.7-7.3]) performed moderately, while Gemini had the lowest score (6.3 [6.0-6.7]). These differences were consistent across all scoring dimensions (P less than .001). Case type did not affect LLM performance (P = .900). CONCLUSIONS: LLMs show promise in IC decision making, but their performance remains suboptimal. Maximizing their potential requires systematic integration of web search capabilities and guideline-based knowledge retrieval.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.644 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.850 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.