Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Medical concept understanding in large language models is fragmented

2026·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Large language models (LLMs) perform strongly across a wide range of medical applications, yet it remains unclear whether such success reflects genuine understanding of medical concepts. We present an ontology-grounded, concept-centered evaluation of medical concept understanding in LLMs. Using 6,252 phenotype concepts from Human Phenotype Ontology, we decompose concept understanding into three core dimensions—concept identity, concept hierarchy, and concept meaning—and design corresponding benchmarks for each dimension. Across a representative set of contemporary LLMs, best-performing models achieve high accuracy on concept identity (90.6%) and hierarchy (83.8%), but lower performance on concept meaning (72.6%). Concept-level analysis reveals substantial fragmentation in LLM understanding: only 57.7% of concepts are consistently understood across all three dimensions, while 41.3% show partial understanding and 1.1% are not captured in any dimension. These results demonstrate that strong application-level performance of LLMs can mask fundamental gaps in concept-level understanding, highlighting the necessity for ontology-grounded evaluation in medical AI.

Autoren

Institutionen

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationTopic Modeling

Volltext beim Verlag öffnen

Medical concept understanding in large language models is fragmented

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen