Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT-3.5 and the Polish thoracic surgery specialty examination: a performance evaluation
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Introduction:The incredibly rapid development of artificial intelligence (AI) in recent years has created new opportunities for its application in medical advancements.This raises questions about the reliability and limitations of AI.Aim: The aim of the present study was to evaluate the effectiveness of the ChatGPT-3.5language model in solving the test component of the National Specialist Examination (PES) in the field of thoracic surgery.Material and methods: A total of 120 test questions from 2015 PES examination were analyzed.They were grouped according to subject matter, clinical character, and cognitive requirements.In independent sessions, each question was submitted five times.The following statistical tests were applied: c 2 , Kruskal-Wallis, Mann-Whitney and Spearman's rank correlation.The consistency of the answers was assessed using Fleiss' k coefficient.Results: The AI tool achieved a score of 42.2% correct answers, with the passing threshold set at 60%.A statistically significant difference was found between clinical and non-clinical questions (p = 0.041).Correct answers were characterized by a higher confidence coefficient (p < 0.001).No correlation was observed between confidence and psychometric indicators.The response consistency was assessed as moderate (k = 0.341).Conclusions: The result obtained by ChatGPT-3.5 is equivalent to a failing score on the examination.The confidence of responses correlated with their correctness, whereas limitations in clinical knowledge and consistency indicate the need for caution when using this model to assess specialized knowledge.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.