OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 08:35

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Deriving the OTA/AO fracture classification from routinely collected radiology reports using a large language model

2026·0 Zitationen·OTA International The Open Access Journal of Orthopaedic TraumaOpen Access
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2026

Jahr

Abstract

Abstract Objectives: Fracture classification plays a pivotal role in research and quality assurance; despite its wide acceptance, the OTA/AO classification is seldom documented in patients' electronic medical records, which impedes fracture registry creation and effective interdisciplinary communication. In this study, we investigate “off-the-shelf” large language models (LLMs) in translating free text in radiology reports into OTA/AO classification labels. Methods: We employed a Health Insurance Portability and Accountability Act-compliant LLM to classify 109 fracture descriptions from randomly selected radiology reports in a deidentified electronic medical record database. Ground-truth classifications were assigned by expert orthopaedic traumatologists based on corresponding radiographs. Multiple prompting strategies were tested, including zero-shot prompting, zero-shot chain-of-thought prompting, and retrieval-augmented generation. We additionally asked the LLM to assign classification labels to “ideal” fracture descriptions written according to the 2018 OTA/AO Fracture and Dislocation Classification Compendium. Model performance was assessed using Cohen kappa and accuracy against ground-truth labels. Results: The 3 prompting strategies tested yielded similar classification performance on radiology report fracture descriptions, with almost perfect agreement at the bone and bone and location levels. Performance declined to slight agreement at the subgroup level. The best performance was observed using ideal fracture descriptions with retrieval-augmented generation, in which the agreement between the full LLM-generated and ground-truth labels remained moderate. Classification errors were largely due to imprecise descriptions, hallucinated information, or incorrect application of factually correct information. Conclusions: Our study demonstrates some potential for LLMs to translate free-text fracture descriptions into OTA/AO classifications, allowing for efficient labeling of large datasets of radiology reports. Future work should focus on refining model classification capabilities using more sophisticated prompting methods. Level of Evidence: Level III.

Ähnliche Arbeiten