Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing GPT-4 performance and alignment with human expertise for text-based MRI protocol assignment
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Accurate imaging protocol assignment is a critical component of clinical radiology workflows. This study evaluates the extent to which GPT-4’s decision-making aligns with human expertise in text based imaging protocol classification and compares its performance and interpretability with established fine-tuned models. We tested GPT-4, a fine-tuned BERT model, and a fine-tuned LLaMA-3 model on physician-entered clinical indications corresponding to 11 head MRI protocol categories. Model predictions were benchmarked against expert-validated ground truth labels using F1 scores, and each model’s reasoning coherence was reviewed by a board-certified radiologist. To explore a practical decision-support workflow, we also evaluated LLaMA-3 assisted prompting, where a fine-tuned classifier’s prediction is provided as additional context to GPT-4 for optional revision and explanation. GPT-4 achieved an F1 score of 0.83, compared with 0.91 for BERT, 0.93 for LLaMA-3, and 0.96 for human experts. While GPT-4 underperformed in raw classification accuracy, it consistently produced the most interpretable, human-like explanations, demonstrating nuanced understanding of clinical language and imaging rationale. When conditioning the prompt by LLaMA-3, GPT-4’s accuracy substantially improved, suggesting that structured collaboration between general and specialized models can enhance both performance and transparency. Overall, these findings indicate that GPT-4 can provide interpretable, text-based outputs that may support clinical decision-making when used with appropriate safeguards.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.