Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma

2025·1 Zitationen·Discover OncologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

ABS: OBJECTIVE: This study aimed to compare the performance of three large language models (LLMs)-ChatGPT-3.5, ChatGPT-4, and Open AI-o1-in addressing clinical questions related to Programmed Cell Death in multiple myeloma. By evaluating each model's accuracy, comprehensiveness, and self-correcting capabilities, the investigation sought to determine the most effective tool for supporting clinical decision-making in this specialized oncological context. METHODS: A comprehensive set of forty clinical questions was curated from recent high-impact oncology journals, International Myeloma Working Group (IMWG) guidelines, and reputable medical databases, covering various aspects of Programmed Cell Death in multiple myeloma. These questions were refined and validated by a panel of four hematologists-oncologists with expertise in the field. Each question was individually posed to ChatGPT-3.5, ChatGPT-4, and Open AI-o1 in controlled sessions. Responses were anonymized and evaluated by the same panel using a five-point Likert scale assessing accuracy, depth, and completeness. Responses were categorized as "excellent," "satisfactory," or "insufficient" based on cumulative scores. Additionally, the models' self-correcting abilities were assessed by providing feedback on initially insufficient responses and re-evaluating the revised answers. Interrater reliability was measured using Cohen's Kappa coefficients. RESULTS: Open AI-o1 consistently generated the most extensive and detailed responses, achieving significantly higher total scores across all domains compared to ChatGPT-3.5 and ChatGPT-4. It demonstrated the lowest proportion of "insufficient" responses and the highest percentage of "excellent" answers, particularly excelling in guideline-based questions. Open AI-o1 also exhibited superior self-correcting capacity, effectively enhancing its responses upon receiving feedback. The highest Cohen's Kappa coefficient among the models indicated greater consistency in evaluations by clinical experts. User satisfaction surveys revealed that 85% of hematologists-oncologists rated Open AI-o1 as "highly satisfactory," compared to 60% for ChatGPT-4 and 45% for ChatGPT-3.5. CONCLUSION: Open AI-o1 outperforms ChatGPT-3.5 and ChatGPT-4 in accuracy, depth, and reliability when addressing complex clinical questions related to Programmed Cell Death in multiple myeloma. Its advanced "thinking" ability facilitates comprehensive and evidence-based responses, making it a more dependable tool for clinical decision support. These findings suggest that Open AI-o1 holds significant potential for enhancing clinical practices in specialized oncological fields, though ongoing validation and integration with human expertise remain essential.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMultiple Myeloma Research and TreatmentsLung Cancer Research Studies

Volltext beim Verlag öffnen

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen