Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large language models for generating script concordance test in obstetrics and gynecology: ChatGPT and Claude
3
Zitationen
5
Autoren
2025
Jahr
Abstract
OBJECTIVE: To evaluate the performance of large language models (ChatGPT-4o and Claude 3.5 Sonnet) to generate script concordance test (SCT) items for assessing clinical reasoning in obstetrics and gynecology. METHODS: This cross-sectional study involved the generation of SCT items for five common diagnostic topics in obstetrics and gynecology in primary care settings. A total of 16 panelists evaluated the AI-generated SCT items against 11 predefined criteria. Descriptive statistics were used to compare the models' performance across criteria. RESULTS: ChatGPT-4o had an overall agreement rate of 90.57% for SCT items meeting the quality criteria, while Claude 3.5 Sonnet achieved 91.48%. The criterion with the lowest scores was "The scenario is of appropriate difficulty for medical students," with ChatGPT-4o rated at 71.25% and Claude 3.5 Sonnet at 76.25%. CONCLUSION: Large language models can generate SCT items that effectively assess clinical reasoning; however, further refinement is required to ensure the appropriate level of difficulty for medical students. These findings highlight the potential of AI to enhance the efficiency of SCT generation in obstetrics and gynecology within primary care settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.