Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT provides high‐quality answers to FAQs about high tibial osteotomy despite low inter‐rater agreement
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract Purpose High tibial osteotomy (HTO) is frequently used to treat knee malalignment in younger patients. Given the rise in online health information‐seeking behaviour, this study aimed to evaluate the quality of ChatGPT‐generated responses to frequently asked questions (FAQs) about HTO and to assess the reliability of two scoring systems used by orthopaedic surgeons. Methods Twenty‐four FAQs were submitted to ChatGPT (GPT‐4‐turbo). Four orthopaedic surgeons independently rated the responses at two time points using: (1) a 4‐point categorical scale (1 = excellent, 4 = poor), and (2) a 100‐point numerical scale (0 = worst, 100 = best). Intra‐observer reliability was assessed using weighted kappa ( κ ) and intraclass correlation coefficients (ICC); inter‐observer agreement was measured using ICC values. Results Most responses were rated positively, with over 70% considered ‘excellent’ or requiring minimal clarification. Intra‐observer agreement was variable, ranging from κ = 0.333 to 0.864 and ICC = 0.690–0.922. Inter‐observer agreement was consistently low across both scales (ICC ≤ 0.390). Conclusion ChatGPT responses to HTO‐related FAQs were rated as high quality by most evaluators. However, the low inter‐observer agreement highlights the need for standardised evaluation tools and suggests that expert oversight remains essential when integrating AI‐generated content into patient education. Level of Evidence Level V.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.