Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Performance of Large Language Models in Muscle Histology Classification Highlights Enhanced Accuracy of ChatGPT-4o in Tissue Identification
1
Zitationen
5
Autoren
2025
Jahr
Abstract
Introduction One of the most promising avenues of artificial intelligence (AI) integration into medicine is its examination, evaluation, and characterization of pathological slides. The use of large language models (LLMs), the AI model subtype that is becoming increasingly popular, in pathological applications remains unexplored. This study investigates the histological image recognition capabilities of the multimodal models Gemini 1.5 Flash, ChatGPT-4o, and Claude 3.5 Sonnet and assesses their suitability for clinical or medical education use. Methods The models were evaluated using 300 digital histology images derived from the University of South Florida Morsani College of Medicine Virtual Microscopy database, with a prompt to ascertain each model's ability to identify tissue type and plane of sectioning used. The images included the three subtypes in both longitudinal and transverse planes of sectioning. Standard machine learning metrics such as precision, recall, accuracy, and F1 score were used to classify and evaluate each model's abilities. Results In the prediction of tissue type, OpenAI's ChatGPT had the highest metrics with an F1 score of 0.772, while Claude yielded an F1 score of 0.380, and Gemini produced a 0.460 F1 score. In the prediction of sectioning, ChatGPT produced an F1 score of 0.396, while Claude produced a value of 0.472, and Gemini yielded 0.344. Conclusion Overall, the results indicate that ChatGPT is most effective at identifying tissues. However, the inaccuracy demonstrated in evaluating sectioning compared to other models leaves room for improvement in its overall accuracy across varying tissue samples to reliably supplement medical education or clinical use.
Ähnliche Arbeiten
A survey on deep learning in medical image analysis
2017 · 13.697 Zit.
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.287 Zit.
A survey on Image Data Augmentation for Deep Learning
2019 · 11.881 Zit.
QuPath: Open source software for digital pathology image analysis
2017 · 8.262 Zit.
Radiomics: Images Are More than Pictures, They Are Data
2015 · 8.056 Zit.