Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Medical concept understanding in large language models is fragmented
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Abstract Large language models (LLMs) perform strongly across a wide range of medical applications, yet it remains unclear whether such success reflects genuine understanding of medical concepts. We present an ontology-grounded, concept-centered evaluation of medical concept understanding in LLMs. Using 6,252 phenotype concepts from Human Phenotype Ontology, we decompose concept understanding into three core dimensions—concept identity, concept hierarchy, and concept meaning—and design corresponding benchmarks for each dimension. Across a representative set of contemporary LLMs, best-performing models achieve high accuracy on concept identity (90.6%) and hierarchy (83.8%), but lower performance on concept meaning (72.6%). Concept-level analysis reveals substantial fragmentation in LLM understanding: only 57.7% of concepts are consistently understood across all three dimensions, while 41.3% show partial understanding and 1.1% are not captured in any dimension. These results demonstrate that strong application-level performance of LLMs can mask fundamental gaps in concept-level understanding, highlighting the necessity for ontology-grounded evaluation in medical AI.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.333 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.696 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.221 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.640 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.414 Zit.