Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating artificial intelligence large language models’ performances in a South African high school chemistry exam
3
Zitationen
1
Autoren
2025
Jahr
Abstract
Gemini, ChatGPT Plus, and Claude 3.5 Sonnet are artificial intelligence (AI) chatbots with potential in education. Their capabilities, such as acting as virtual teaching assistants, offering personalized responses to learners’ queries, and summarizing content, make them versatile tools with the potential to assist learners. The chemistry section of physical sciences in South Africa is often considered challenging, and learners could benefit from virtual teaching assistants to supplement traditional instruction. However, little is known about AI chatbots’ abilities in solving high school chemistry problems. This descriptive case study examined the capabilities of Gemini, Claude 3.5 Sonnet, and ChatGPT Plus in accurately answering questions from the final grade 12 physical sciences chemistry exam in South Africa. The conceptual framework that guided the study was Bloom’s taxonomy of educational objectives. The responses were rigorously evaluated using the same criteria and rubrics applied to the candidates that year, ensuring a fair and robust comparison. The findings were that ChatGPT Plus performed at 47%, Gemini at 51% and Claude 3.5 Sonnet at 65%. All chatbots performed above the average performance of the candidates who sat for the paper that year, which was 46%. This has significant implications for policymakers, teachers, and learners regarding integrating large language models in teaching physical sciences and exam preparation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.626 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.843 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.