Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting
1
Zitationen
11
Autoren
2025
Jahr
Abstract
The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical need for interoperability and standardization. However, the heterogeneous nature of real-world clinical data poses significant challenges to ensuring seamless data exchange and secondary use for research purposes. These challenges include syntactic inconsistencies (e.g., variable use of terminologies like ICD-10 vs SNOMED CT), semantic mismatches (e.g., differing conceptualizations of disease staging across institutions), and structural fragmentation (e.g., laboratory results encoded in free text rather than structured fields). Fast Healthcare Interoperability Resources (FHIR) has emerged as a leading standard for structuring and harmonizing healthcare data, enabling integration across diverse systems. This work presents a FHIR-based transformation pipeline that leverages Resource Description Framework (RDF) to convert raw, conceptually heterogeneous oncology data into research-ready, semantically enriched datasets. By representing FHIR resources as RDF graphs, our approach enables semantic interoperability, enhances data linkage across heterogeneous sources, and supports automated reasoning through ontology-based queries and inference mechanisms. The pipeline employs a templated conversion strategy, allowing for the declarative definition of mappings that enable domain experts to focus on the data model. In Cancer Virtual Lab, we applied this methodology to a real-world oncology dataset comprising 36,335 anonymized patient records, successfully converting 1,093,705 clinical records into 1,151,559 distinct RDF-based FHIR resource types. The process incorporated syntactic and semantic validation, along with expert review, to ensure technical correctness and clinical relevance. Our results demonstrate the feasibility of semantically integrating oncology data using FHIR and RDF, fostering machine-readable, interoperable knowledge representation. This enriched representation supports data quality monitoring and improvement, data harmonization, longitudinal analysis, advanced analytics, and AI-driven decision support, promoting large-scale secondary use.
Ähnliche Arbeiten
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
2008 · 28.934 Zit.
TNM Classification of Malignant Tumours
1987 · 16.123 Zit.
A survey on deep learning in medical image analysis
2017 · 13.616 Zit.
Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening
2011 · 10.776 Zit.
The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
2010 · 9.111 Zit.