Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of 5 AI Models on United States Medical Licensing Examination Step 1 Questions: Comparative Observational Study
0
Zitationen
6
Autoren
2026
Jahr
Abstract
AI models showed varying strengths across domains, with Grok demonstrating the highest accuracy and consistency in this dataset, particularly for image-based and reasoning-heavy questions. Although ChatGPT-4 remains widely used, newer models like Grok and Copilot also performed competitively. Continuous evaluation is essential as AI tools rapidly evolve.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.534 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.423 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.917 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.582 Zit.