OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.03.2026, 05:01

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparison of GPT-4o With Human Performance in the Polish Vascular Surgery Specialty Examination

2025·0 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

0

Zitationen

16

Autoren

2025

Jahr

Abstract

Background Artificial intelligence (AI) offers many possibilities by using language models such as ChatGPT, also known as a synthetic generative intelligence chatbot advanced by OpenAI (OpenAI, Inc., San Francisco, USA). Using the potential of AI in medicine can provide a crucial tool to assess medical expertise and create a promising future in the field of medical education. Prior investigations have documented the progressive advancing performance of AI systems in addressing medical situations. These studies were also conducted in the evaluation of Polish medical examinations, comprising the State Specialization Examination (PES) as discussed in this article. These findings have stimulated scholarly debate regarding the potential of such technologies to serve as instruments for enhancing postgraduate specialist education and training. Objective This study aimed to evaluate the performance of the ChatGPT-4o model in solving the PES in the field of vascular surgery. The analysis examined both the correctness of the answers and the model's stated confidence, with the goal of understanding its potential value in education. Methods This study was developed using the official PES in vascular surgery from a previous session, namely, the Spring 2025 edition, comprising 120 multiple-choice items. The ChatGPT-4o model was acquainted with the examination regulations beforehand, and all items were presented in the Polish language. Response accuracy was evaluated against the database of correct answers of the Medical Examination Center (CEM) in Łódź and also included the model's self-reported confidence rating on a five-point scale. Statistical analyses were conducted using the chi-square test to compare categorical variables and the Mann-Whitney U test to assess differences between non-normally distributed continuous variables. Results ChatGPT-4o achieved 88 correct answers (73.3%), thereby surpassing the minimum passing criterion for the examination. There was no apparent distinction in the efficacy of clinical and non-clinical questions (p=0.561). The model's self-reported confidence levels did not largely correlate with its response accuracy. Such discrepancies show that, while ChatGPT can imply its doubts, it is not able to consistently predict performance, highlighting ongoing limitations in the model's self-assessment capabilities. Conclusions ChatGPT-4o demonstrated satisfactory results on the PES vascular surgery exam, highlighting AI's promise in specialist education, particularly as a support for learning a special field of medicine with specific conditions. It is crucial to treat ChatGPT as a supporting educational tool, not exclusively used by one source of knowledge. These findings indicate that advanced AI models may serve as valuable tools in a specialist field of education. Nonetheless, careful oversight by medical professionals and additional validation studies across various medical fields are necessary before AI models can be widely implemented in medical education.

Ähnliche Arbeiten