OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.05.2026, 01:10

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial intelligence in oncology: A cross-sectional analysis of chatgpt and openevidence performance on ALL guidelines

2025·0 Zitationen·Blood
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

Abstract Background Artificial intelligence (AI) is becoming increasingly popular amongst medical professionals. With the creation of ChatGPT and OpenEvidence, physicians and healthcare workers have convenient access to AI platforms to help with patient care. Within the field of oncology, the National Comprehensive Cancer Network (NCCN) guidelines remain a governing body for treating cancers. Acute lymphoblastic leukemia (ALL) accounts for approximately one third of all childhood malignancies and is the most common cancer in children1,2. Approximately 2500 to 3500 cases of ALL are diagnosed each year in children, with an annual incidence of approximately 3.4 cases per 1000001. Our goal of this study was to evaluate how ChatGPT and OpenEvidence answer diagnosis and treatment related questions regarding ALL. Methods We conducted a cross sectional study to determine how ChatGPT answers diagnosis and treatment related questions regarding ALL. To do so, we created 10 questions specific to ALL and uploaded them to the following AI platforms under the following conditions: ChatGPT without any input, ChatGPT with the current NCCN guidelines uploaded, and OpenEvidence. We then reviewed the output and assigned scoring to the AI generated answers for the following categories: accuracy (scored 0-5, with 0 being a non-answer and 5 being completely accurate); completeness (0 being incomplete and 2 being fully complete); presence of citations (assigned a yes/no). Results For the ChatGPT without NCCN guidelines condition, the mean accuracy of the output was 3.6 with a standard deviation of 1.7; citations were mentioned 10% of the time. For the ChatGPT with the NCCN guidelines uploaded condition, the mean accuracy of the output was 4.2 with a standard deviation of 1.3; citations were included 10% of the time. For the OpenEvidence condition, the mean accuracy of the output was 4.6 with a standard deviation of 1.0; citations were included 100% of the time. For the accuracy and completeness scoring, we performed analysis of variance (ANOVA) to determine if the mean values were statistically different from each other. We defined p=0.05 as being statistically significant. One-way ANOVA revealed no statistically significant difference in mean accuracy between the three groups (F(2, 27) = 1.36, p = 0.27) and no statistically significant difference in the mean completeness between the three groups (F(2, 27) = 0.77, p = 0.47). Discussion The mean accuracy or completeness of OpenEvidence, ChatGPT with the current NCCN guidelines uploaded, and ChatGPT without NCCN guidelines are not statistically different from each other. Of note, no AI tool was able to accurately answer preferred treatment options. More data is required to distinguish if more detailed questioning could fix this issue. These results lack power. Further testing with more questioning is needed to enhance power. This study highlights the need to assess how we use AI in healthcare. Specifically, it emphasizes the need to determine best practices for choosing the input we provide to AI and for determining which AI platform we use. AI is constantly evolving, thus one key point from this project is determining how to best use it to suit our current needs.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Acute Lymphoblastic Leukemia researchArtificial Intelligence in Healthcare and EducationLung Cancer Research Studies
Volltext beim Verlag öffnen