Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Representation Format Effects on Small Language Model Diagnostic Fidelity in Primary Care Pipelines: A Three-Arm Paired Simulation Protocol with a Flat, FHIR, and openEHR Illustration

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background. Primary care deploys small language models (SLMs) in sequence. Whether input representation modulates pipeline fidelity, and whether FHIR wrapping or archetype-based structure is required, is not established at small-model scale. Methods. We establish a three-arm paired simulation protocol. Each of 50 Synthea patients was processed through a Triage → Coder SLM pipeline (Gemma 3 1B, Qwen3 1.7B) under three conditions: flat tabular text, compact FHIR R4 Bundle JSON, and openEHR Composition JSON with four Clinical Knowledge Manager archetypes. Three repeats per condition at temperature 0.5 (450 runs). Compact FHIR was pre-specified to control an input-length confound (full FHIR Bundles run approximately 6.5× openEHR length). Primary outcome: semantic F1 against Synthea condition descriptions. Statistics: Friedman test, pairwise Wilcoxon with Bonferroni correction, paired Cohen's d. Results. Mean semantic F1: Flat 0.193, FHIR 0.198, openEHR 0.228. Friedman significant (χ² = 7.56, p = 0.023). Bonferroni-corrected pairwise: Flat vs FHIR p = 1.00 (d = +0.03); Flat vs openEHR p = 0.10 (d = +0.29); FHIR vs openEHR p = 0.09 (d = +0.33). Best arm per patient: Flat 10, FHIR 15, openEHR 25. Directionally consistent with an openEHR lift; no pairwise contrast survives Bonferroni at n = 50. Conclusions. The protocol establishes a reproducible instrument for representation-format effects in multi-SLM clinical pipelines. The n = 50 illustration gives preliminary evidence consistent with long-standing arguments that ontological richness, not mere structural wrapping, drives semantic preservation through a language model cascade. Findings invite collaborative follow-up on larger, non-synthetic cohorts.

Autoren

Florian Odi Stummer

Institutionen

Themen

Simulation-Based Education in HealthcareArtificial Intelligence in Healthcare and EducationElectronic Health Records Systems

Volltext beim Verlag öffnen

Representation Format Effects on Small Language Model Diagnostic Fidelity in Primary Care Pipelines: A Three-Arm Paired Simulation Protocol with a Flat, FHIR, and openEHR Illustration

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen