Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of Artificial Intelligence Accuracy in Interpreting Pulmonary Function Tests: A Comparison of ChatGPT 4o, DeepSeek R1, and Claude 3.5 Sonnet

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) offer potential for clinical decision support, yet their role in spirometry interpretation is understudied. This study evaluates the accuracy, clinical utility, and performance of ChatGPT4o, DeepSeekR1, and Claude 3.5. Materials and Methods: Spirometry results from 100 randomly selected patients at Ege University were retrospectively analyzed. Interpretations by two independent pediatric pulmonologists served as the reference standard. ChatGPT, DeepSeek, and Claude were evaluated based on six ATS/ERS criteria. Each LLM was tested using a standardized prompt to assess acceptability and clinical usability, classifying patterns as Normal, Obstructive, Restrictive, or Mixed. <bold>Results:</bold> ChatGPT had the highest agreement with experts, while DeepSeek and Claude showed lower concordance. Mixed pattern and restrictive disorders were the most misclassified. Logistic regression identified expiratory time as the key factor in LLM decision-making. Conclusion: This study highlights the potential and limitations of LLMs in spirometry interpretation. While they cannot replace expert judgment, LLMs can enhance clinical efficiency. ChatGPT showed the highest accuracy, but further optimization is needed for complex cases. Future research should improve model training, real-time learning, and clinical integration. This study provides one of the first comparisons of AI in spirometry interpretation, contributing valuable insights to the field. <table-wrap><object-id>erj;66/suppl_69/OA1165/TB1</object-id><object-id>T1</object-id><object-id>TB1</object-id><table><colgroup><col></col><col></col><col></col></colgroup><tbody><tr><td>Models</td><td>Usability (ĸ, p-value)</td><td>SFT pattern (ĸ, p-value)</td></tr><tr><td>ChatGPT</td><td>0.48, 0.82</td><td>0.51, <0.05</td></tr><tr><td>Claude</td><td>0.09, <0.05</td><td>0.29, <0.05</td></tr><tr><td>DeepSeek</td><td>0.00, <0.05</td><td>0.29, <0.05</td></tr></tbody></table></table-wrap>

Autoren

Institutionen

Themen

Chronic Obstructive Pulmonary Disease (COPD) ResearchArtificial Intelligence in Healthcare and EducationInhalation and Respiratory Drug Delivery

Volltext beim Verlag öffnen

Evaluation of Artificial Intelligence Accuracy in Interpreting Pulmonary Function Tests: A Comparison of ChatGPT 4o, DeepSeek R1, and Claude 3.5 Sonnet

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen