Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence in oncology: A cross-sectional analysis of chatgpt and openevidence performance on ALL guidelines
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Abstract Background Artificial intelligence (AI) is becoming increasingly popular amongst medical professionals. With the creation of ChatGPT and OpenEvidence, physicians and healthcare workers have convenient access to AI platforms to help with patient care. Within the field of oncology, the National Comprehensive Cancer Network (NCCN) guidelines remain a governing body for treating cancers. Acute lymphoblastic leukemia (ALL) accounts for approximately one third of all childhood malignancies and is the most common cancer in children1,2. Approximately 2500 to 3500 cases of ALL are diagnosed each year in children, with an annual incidence of approximately 3.4 cases per 1000001. Our goal of this study was to evaluate how ChatGPT and OpenEvidence answer diagnosis and treatment related questions regarding ALL. Methods We conducted a cross sectional study to determine how ChatGPT answers diagnosis and treatment related questions regarding ALL. To do so, we created 10 questions specific to ALL and uploaded them to the following AI platforms under the following conditions: ChatGPT without any input, ChatGPT with the current NCCN guidelines uploaded, and OpenEvidence. We then reviewed the output and assigned scoring to the AI generated answers for the following categories: accuracy (scored 0-5, with 0 being a non-answer and 5 being completely accurate); completeness (0 being incomplete and 2 being fully complete); presence of citations (assigned a yes/no). Results For the ChatGPT without NCCN guidelines condition, the mean accuracy of the output was 3.6 with a standard deviation of 1.7; citations were mentioned 10% of the time. For the ChatGPT with the NCCN guidelines uploaded condition, the mean accuracy of the output was 4.2 with a standard deviation of 1.3; citations were included 10% of the time. For the OpenEvidence condition, the mean accuracy of the output was 4.6 with a standard deviation of 1.0; citations were included 100% of the time. For the accuracy and completeness scoring, we performed analysis of variance (ANOVA) to determine if the mean values were statistically different from each other. We defined p=0.05 as being statistically significant. One-way ANOVA revealed no statistically significant difference in mean accuracy between the three groups (F(2, 27) = 1.36, p = 0.27) and no statistically significant difference in the mean completeness between the three groups (F(2, 27) = 0.77, p = 0.47). Discussion The mean accuracy or completeness of OpenEvidence, ChatGPT with the current NCCN guidelines uploaded, and ChatGPT without NCCN guidelines are not statistically different from each other. Of note, no AI tool was able to accurately answer preferred treatment options. More data is required to distinguish if more detailed questioning could fix this issue. These results lack power. Further testing with more questioning is needed to enhance power. This study highlights the need to assess how we use AI in healthcare. Specifically, it emphasizes the need to determine best practices for choosing the input we provide to AI and for determining which AI platform we use. AI is constantly evolving, thus one key point from this project is determining how to best use it to suit our current needs.
Ähnliche Arbeiten
CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2006-2010
2013 · 12.094 Zit.
Cancer treatment and survivorship statistics, 2016
2016 · 6.151 Zit.
CHOP Chemotherapy plus Rituximab Compared with CHOP Alone in Elderly Patients with Diffuse Large-B-Cell Lymphoma
2002 · 5.535 Zit.
Tisagenlecleucel in Children and Young Adults with B-Cell Lymphoblastic Leukemia
2018 · 5.533 Zit.
Chimeric Antigen Receptor T Cells for Sustained Remissions in Leukemia
2014 · 5.365 Zit.