Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment: An Evaluation on the 2024 PNA Exam
0
Zitationen
4
Autoren
2025
Jahr
Abstract
The integration of Large Language Models (LLMs) into healthcare holds significant potential to enhance diagnostic accuracy and support medical treatment planning. This study evaluates the performance of a range of contemporary LLMs on the 2024 Portuguese National Exam for medical specialty access (PNA), a standardized medical knowledge assessment. Our results highlight considerable variation in accuracy and cost-effectiveness, with several models demonstrating performance comparable to or exceeding human benchmarks for medical students on this specific task. We analyze leading models based on a combined score of accuracy, cost, and potential data contam-ination risk. We extensively discuss insights from comprehensive benchmarks like HealthBench, detailing its methodology and findings on model behavior across diverse health contexts. We fur-ther examine reasoning methodologies like Chain-of- Thought and Chain-of-Draft, emerging model architectures, and underscore the potential for LLMs to function as valuable complementary tools aiding medical professionals, within a robust ethical and regulatory framework.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.