Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Confirming SPSS Results With ChatGPT-4 and o3-mini Models
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Background This research compared the simple and advanced statistical results of SPSS (IBM Corp., Armonk, NY, USA) with ChatGPT-4 and ChatGPT o3-mini (OpenAI, San Francisco, CA, USA) in statistical data output and interpretation with behavioral healthcare data. It evaluated their methodological approaches, quantitative performance, interpretability, adaptability, ethical considerations, and future trends. Methods Fourteen statistical analyses were conducted from two real datasets that produced peer-reviewed, published scientific articles in 2024. Descriptive statistics, Pearson r, multiple correlation with Pearson r, Spearman's rho, simple linear regression, one-sample t-test, paired t-test, two-independent sample t-test, multiple linear regression, one-way analysis of variance (ANOVA), repeated measures ANOVA, two-way (factorial) ANOVA, and multivariate ANOVA were computed. The two datasets adhered to a systematically structured timeframe, March 19, 2023, through June 11, 2023, and June 7, 2023, through July 7, 2023, thereby ensuring the integrity and temporal representativeness of the data gathering. The analyses were conducted by inputting the verbal (text) commands into ChatGPT-4 and ChatGPT o3-mini along with the relevant SPSS variables, which were copied and pasted from the SPSS datasets. Results The study found high concordance between SPSS and ChatGPT-4 in fundamental statistical analyses, such as measures of central tendency, variability, and simple Pearson and Spearman correlation analyses, where the results were nearly identical. ChatGPT-4 also closely matched SPSS in the three t-tests and simple linear regression, with minimal effect size variations. Discrepancies emerged in complex analyses. ChatGPT o3-mini showed inflated correlation values and significant results where none were expected, indicating computational deviations. ChatGPT o3-mini produced inflated coefficients in the multiple correlation and R-squared values in two-way ANOVA and multiple regression, suggesting differing assumptions. ChatGPT-4 and ChatGPT o3-mini produced identical F-statistics with repeated measures ANOVA but reported incorrect degrees of freedom (df) values. While ChatGPT-4 performed well in one-way ANOVA, it miscalculated degrees of freedom in multivariate ANOVA (MANOVA), leading to significant discrepancies. ChatGPT o3-mini also generated erroneous F-statistics in factorial ANOVA, highlighting the need for further optimization in multivariate statistical modeling. Conclusions This study underscored the rapid advancements in artificial intelligence (AI)-driven statistical analyses while highlighting areas that require further refinement. ChatGPT-4 accurately executed fundamental statistical tests, closely matching SPSS. However, its reliability diminished in more advanced statistical procedures, requiring further validation. ChatGPT o3-mini, while optimized for Science, Technology, Engineering, and Mathematics (STEM) applications, produced inconsistencies in correlation and multivariate analyses, limiting its dependability for complex research applications. Ensuring its alignment with established statistical methodologies will be essential for widespread scientific research adoption as AI evolves.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.