OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 07:56

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparison of GPT-5 Responses With the Official Results of the Polish Specialized Psychiatric Examination in Child and Adolescent Psychiatry

2025·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

15

Autoren

2025

Jahr

Abstract

Introduction Artificial intelligence (AI), particularly language models such as ChatGPT (OpenAI, San Francisco, CA, USA), is becoming increasingly important in medical education and knowledge assessment. Prior studies have demonstrated the growing effectiveness of AI in preparing students for medical examinations, including the Medical Final Examination (Lekarski Egzamin Końcowy (LEK)) of Poland and the National Specialty Examination across various disciplines. This raises important questions regarding its potential role as a tool to support specialist training. Objective The aim of this study is to evaluate the effectiveness of the advanced GPT-5 model in addressing problems in child and adolescent psychiatry. The focus is on the accuracy of answers, their correctness, and the model's self-declared confidence levels to assess its potential efficacy in education. Methodology The study analyzed the official spring 2025 National Specialty Examination (Państwowy Egzamin Specjalizacyjny (PES)) of Poland in child and adolescent psychiatry. The exam consisted of 120 multiple-choice questions with a single correct answer. GPT-5 was familiarized with the examination rules and then presented with the questions in the Polish language. Answers were evaluated using the official Centre for Medical Examination (CEM) key. In addition, the model provided a confidence rating for each answer on a five-point scale. Questions were categorized as either clinical or theoretical. Statistical analysis was conducted using the chi-square test and the Mann-Whitney U test. Results GPT-5 answered 97 questions correctly (80.8%), surpassing the required passing threshold. No significant difference was observed between the accuracy of responses to clinical versus theoretical questions (p = 0.399). However, correct answers were significantly more likely when the model reported higher confidence levels (p = 0.012). Conclusions GPT-5 demonstrated strong performance in the National Specialty Examination of Poland in child and adolescent psychiatry, supporting its potential as a supplementary tool in specialist education. Confidence ratings may provide an additional metric for evaluating the reliability of answers. Nevertheless, broader integration of AI in medical education requires experts overseeing the process and further research across diverse medical disciplines.

Ähnliche Arbeiten