Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT’s Limitations in Athlete ECG Interpretation: Evidence from a Multicenter Diagnostic Study

2026·0 Zitationen·Journal of Cardiovascular Development and DiseaseOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background: Artificial intelligence (AI) has shown promise in the interpretation of electrocardiograms (ECGs) using signal-based deep learning models. In parallel, large language models (LLMs) have gained increasing visibility in clinical practice, including exploratory applications in ECG analysis. Whether a general-purpose LLM can meaningfully discriminate cardiovascular disease from athlete ECGs during PPS remains unknown. We aimed to evaluate the diagnostic performance of a general-purpose LLM for this task. Methods: In this multicentre diagnostic accuracy study, we evaluated a commercially available LLM (ChatGPT, version 5) in 2950 competitive athletes undergoing PPS. All athletes underwent resting 12-lead ECG, with second- and third-line investigations performed when clinically indicated. The reference outcome was confirmed cardiovascular disease after full diagnostic work-up (n = 450, 15.3%). For each ECG, the LLM generated a numeric score (0–100) representing the inferred likelihood of underlying disease using a standardized prompt and without task-specific fine-tuning. Discriminative performance was assessed using receiver operating characteristic (ROC) analysis. Misclassification patterns were analysed according to International ECG Criteria. Results: GPT-derived scores demonstrated a marked floor effect, with a median value of 0 (IQR 0–2) in both diseased and non-diseased athletes and substantial overlap between groups. The area under the ROC curve was 0.52 (95% CI 0.49–0.55), indicating performance close to random classification. At the Youden-derived threshold, 79% of athletes with confirmed disease were incorrectly classified as negative. False-negative cases were predominantly characterized by borderline ECG patterns (82%), and a substantial number of red-flag ECG abnormalities were also missed. Conclusions: In this PPS cohort, a general-purpose LLM used in a naïve configuration showed no clinically meaningful ability to discriminate between cardiovascular disease and athlete ECGs. Without task-specific training or domain adaptation, such models should not be used for diagnostic triage in athlete screening.

Autoren

Institutionen

Themen

Cardiovascular Effects of ExerciseECG Monitoring and AnalysisArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

ChatGPT’s Limitations in Athlete ECG Interpretation: Evidence from a Multicenter Diagnostic Study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen