OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.05.2026, 08:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI-generated questions for urological competency assessment: a prospective educational study

2025·4 Zitationen·BMC Medical EducationOpen Access
Volltext beim Verlag öffnen

4

Zitationen

3

Autoren

2025

Jahr

Abstract

BACKGROUND: The integration of artificial intelligence (AI) in medical education assessment remains largely unexplored, particularly in specialty-specific evaluations during clinical rotations. Traditional question development methods are time-intensive and often struggle to keep pace with evolving medical knowledge. This study evaluated the effectiveness of AI-generated questions in assessing urological competency among medical interns during a standardized clinical rotation. METHODS: Two state-of-the-art AI language models (ChatGPT and Gemini) generated 300 multiple-choice questions across six urological subspecialties. Seven experienced urologists, each with over 10 years of clinical practice and active involvement in resident training programs, independently evaluated the questions using a modified Delphi approach with standardized scoring rubrics. The evaluation criteria encompassed technical accuracy based on current clinical guidelines, clinical relevance to core rotation objectives, construct validity assessed through cognitive task analysis, and alignment with rotation objectives. Questions achieving consensus approval from at least five experts were retained, resulting in 100 validated questions. RESULTS: From the initial cohort of 45 eligible interns, 42 completed all three assessment points (93.3% completion rate). Performance improved significantly from baseline (mean: 45.2%, 95% CI: 42.6-47.8%) through mid-rotation (mean: 62.8%, 95% CI: 60.4-65.2%) to final assessment (mean: 78.4%, 95% CI: 76.5-80.3%). The technical accuracy was comparable between AI platforms (ChatGPT: 84.3%, Gemini: 83.8%, p = 0.86). Clinical scenario questions demonstrated better discrimination than recall questions (mean indices: 0.28 vs. 0.14, p < 0.001). Subspecialty performance varied, with highest scores in uro-oncology (mean: 82.6%, 95% CI: 80.2-85.0%) and endourology (mean: 79.4%, 95% CI: 77.0-81.8%). CONCLUSIONS: AI-generated questions showed appropriate technical accuracy and difficulty levels for assessing clinical competency in urology. While promising for formative assessment, particularly with clinical scenarios, current limitations in discrimination capability suggest careful consideration for high-stakes testing. The strong correlation between clinical exposure and improved performance validates their effectiveness in measuring knowledge acquisition. These findings support the potential integration of AI-generated questions in specialty-specific assessment, though careful implementation with expert oversight and continuous validation remains essential. CLINICAL TRIAL NUMBER: Not applicable.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSurgical Simulation and TrainingSimulation-Based Education in Healthcare
Volltext beim Verlag öffnen