Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How valuable are the questions and answers generated by large language models in oral and maxillofacial surgery?
3
Zitationen
5
Autoren
2025
Jahr
Abstract
INTRODUCTION: In this study, we aim to evaluate the ability of large language models (LLM) to generate questions and answers in oral and maxillofacial surgery. METHODS: ChatGPT4, ChatGPT4o, and Claude3-Opus were evaluated in this study. Each LLM was instructed to generate 50 questions about oral and maxillofacial surgery. Three LLMs were asked to answer the generated 150 questions. RESULTS: All 150 questions generated by the three LLMs were related to oral and maxillofacial surgery. Each model exhibited a correct answer rate of over 90%. None of the three models were able to answer correctly all the questions they generated themselves. The correct answer rate was 97.0% for questions with figures, significantly higher than the 88.9% rate for questions without figures. The analysis of problem-solving by the three LLMs showed that each model generally inferred answers with high accuracy, and there were few logical errors that could be considered controversial. Additionally, all three scored above 88% for the fidelity of their explanations. CONCLUSION: This study demonstrates that while LLMs like ChatGPT4, ChatGPT4o, and Claude3-Opus exhibit robust capabilities in generating and solving oral and maxillofacial surgery questions, their performance is not without limitations. None of the models were able to answer correctly all the questions they generated themselves, highlighting persistent challenges such as AI hallucinations and contextual understanding gaps. The results also emphasize the importance of multimodal inputs, as questions with annotated images achieved higher accuracy rates compared to text-only prompts. Despite these shortcomings, the LLMs showed significant promise in problem-solving, logical consistency, and response fidelity, particularly in structured or numerical contexts.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.