Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

BAE-ViT: An Efficient Multimodal Vision Transformer for Bone Age Estimation

2024·8 Zitationen·TomographyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

This research introduces BAE-ViT, a specialized vision transformer model developed for bone age estimation (BAE). This model is designed to efficiently merge image and sex data, a capability not present in traditional convolutional neural networks (CNNs). BAE-ViT employs a novel data fusion method to facilitate detailed interactions between visual and non-visual data by tokenizing non-visual information and concatenating all tokens (visual or non-visual) as the input to the model. The model underwent training on a large-scale dataset from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, where it exhibited commendable performance, particularly excelling in handling image distortions compared to existing models. The effectiveness of BAE-ViT was further affirmed through statistical analysis, demonstrating a strong correlation with the actual ground-truth labels. This study contributes to the field by showcasing the potential of vision transformers as a viable option for integrating multimodal data in medical imaging applications, specifically emphasizing their capacity to incorporate non-visual elements like sex information into the framework. This tokenization method not only demonstrates superior performance in this specific task but also offers a versatile framework for integrating multimodal data in medical imaging applications.

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIDomain Adaptation and Few-Shot LearningArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

BAE-ViT: An Efficient Multimodal Vision Transformer for Bone Age Estimation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen