Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CORR Insights®: What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review
10
Zitationen
1
Autoren
2019
Jahr
Abstract
Where Are We Now? In the current study, Langerhuizen and colleagues [4] reviewed the current state of research regarding the use of artificial intelligence (AI) for clinical image analysis, including the detection and classification of fractures. Their study serves as an introduction for orthopaedic surgeons who have heard of AI, but know little about its specific uses and shortcomings. Potentially, AI could ensure better detection of subtle fractures, while also making it easier to verify that fractures are not overlooked in patients with multiple injuries. The development of large digital image libraries combined with considerable advancements in computing power has increased interest in computer-based learning systems among orthopaedic surgeons. The types and capabilities of orthopaedic tools that use AI, machine learning, or both continue to expand. One promising approach involves neural networks, which use computer-based learning systems and large datasets to build a statistical model that can detect fractures. Although one paper on the topic suggested that AI was better than 101 radiologists in detecting breast cancer [5], such technology is not infallible. AI does not constitute “intelligence” in the everyday sense of word. Instead, it is a way to predict statistically whether what we are looking for is in the image. The computer, or algorithm, itself has no sense of what it is looking at, other than a bunch of pixels in an image. Where Do We Need To Go? Although the AI-based tools evaluated by Langerhuizen and colleagues [4] generally diagnosed these fractures slightly better than did orthopaedic surgeons, the differences between the machines and the surgeons generally were not very large, and—importantly—the fractures in some of these reports were quite obvious. We know that surgeons sometimes miss more-subtle fractures, and I suspect the utility of AI as a supplement to human knowledge and expertise will be more important when less-obvious fractures and fractures in difficult-to-detect locations are being considered. Future studies should specifically evaluate AI tools in more-challenging clinical scenarios. Image-based classifications of proximal humeral fractures are generally not reliable, and if AI bases its classifications on the opinions of orthopaedic surgeons, then the end result may be an AI system that reproduces the inaccuracies of humans. However, since no imaging classifications are perfect, it may be difficult (if not impossible) to avoid introducing such kinds of bias into the training of AI-based tools. Many of the image-based classifications (Neer being the textbook example) are built on false assumptions that have been shown to lead to very poor reproducibility [7]. The issue isn’t that the surgeons are biased in doing the readings, but that the classification has bias built into it that can’t be avoided. Currently, whatever answer comes out of the AI system is a complete “black box”, without any way to discern how or why the computer came to that result. In the clinical setting, where life-altering decisions are being made based on imaging, this is not acceptable, either for the patient or the clinician. Explainable AI [3, 6], for example, is a recent area of research designed to permit insight into the rationale that went into providing an answer. It is too early to know if this effort will be successful, and so we will need additional studies to better understand explainable AI. How Do We Get There? Most of the current limitations regarding the use of AI in fracture detection, such as reliance on poorly reproducible classification systems, inability to understand the relevant image details that are being utilized by the AI algorithm to make diagnoses, and the current lack of use of AI in fractures that are difficult to visualize should be amenable to appropriately structured studies. Such studies would include basing the fracture classification on the findings at surgery (a better gold standard than current imaging in most instances) or employing “Explainable AI” models that would permit investigators to actually understand what features of the images are the key for accurate diagnosis. Prospective studies will be needed to discern the ability of AI to detect subtle fractures that are only visible on either advanced imaging techniques or in the operating room. In order to establish AI efficacy in identifying what would otherwise be missed, prospective studies will have to have sufficient follow-up time after the initial injury to capture the initially missed fractures that are appreciated only after late sequelae. The issue of potential unintended biases that are embedded in classifications is difficult to address in the absence of an independent standard against which the imaging classification is compared. At this point, the best candidates for such a standard would be either advanced imaging or surgical findings. Unfortunately, advanced imaging does not always provide unambiguous information [1]. Even surgery is not a perfect solution, as most fractures are not surgically treated, and those fractures that are missed (or misclassified) are also not operated upon. Consequently, the best method to establish a “gold standard” against which to compare the diagnostic acumen of AI will be combination of surgical findings and extended clinical follow-up to detect the missed or otherwise non-operatively treated fractures. It will be essential to conduct prospective studies relying on current clinical treatment practices and collected data from clinical treatment, surgical findings, and clinical outcomes. If prospective trials (which have not yet been undertaken) show improved accuracy by AI, then follow-up studies, including possibly randomized controlled studies, could then compare the accuracy of diagnosis and subsequent outcomes for patients treated with or without the aid of AI image interpretation. In this scenario, AI would be used as an adjunct aid for human imaging interpretation, either by concurrent interaction or with AI as an initial screening tool. Although AI by itself is unlikely to be employed clinically, AI has been incorporated into clinical decision-making process [2]. In these circumstances, techniques to estimate probabilities for survival, for example, have been built based, in part, on AI-based analysis. These probabilities are intended to be used to help the clinician and patient make treatment decisions. However, given the highly variable error rates seen in these circumstances (in part due to the uncertainties of the underlying clinical issues), such algorithms are not being used to independently make treatment decisions. In the case of fractures, in which the decision making is frequently binary (surgery or non-operative treatment), once the fracture is fully characterized, until the error rate for AI is substantially improved, it would be most useful as an aid to human image interpretation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.