OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.05.2026, 11:30

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

How valuable are the questions and answers generated by large language models in oral and maxillofacial surgery?

2025·3 Zitationen·PLoS ONEOpen Access
Volltext beim Verlag öffnen

3

Zitationen

5

Autoren

2025

Jahr

Abstract

INTRODUCTION: In this study, we aim to evaluate the ability of large language models (LLM) to generate questions and answers in oral and maxillofacial surgery. METHODS: ChatGPT4, ChatGPT4o, and Claude3-Opus were evaluated in this study. Each LLM was instructed to generate 50 questions about oral and maxillofacial surgery. Three LLMs were asked to answer the generated 150 questions. RESULTS: All 150 questions generated by the three LLMs were related to oral and maxillofacial surgery. Each model exhibited a correct answer rate of over 90%. None of the three models were able to answer correctly all the questions they generated themselves. The correct answer rate was 97.0% for questions with figures, significantly higher than the 88.9% rate for questions without figures. The analysis of problem-solving by the three LLMs showed that each model generally inferred answers with high accuracy, and there were few logical errors that could be considered controversial. Additionally, all three scored above 88% for the fidelity of their explanations. CONCLUSION: This study demonstrates that while LLMs like ChatGPT4, ChatGPT4o, and Claude3-Opus exhibit robust capabilities in generating and solving oral and maxillofacial surgery questions, their performance is not without limitations. None of the models were able to answer correctly all the questions they generated themselves, highlighting persistent challenges such as AI hallucinations and contextual understanding gaps. The results also emphasize the importance of multimodal inputs, as questions with annotated images achieved higher accuracy rates compared to text-only prompts. Despite these shortcomings, the LLMs showed significant promise in problem-solving, logical consistency, and response fidelity, particularly in structured or numerical contexts.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMultimodal Machine Learning Applications
Volltext beim Verlag öffnen