Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How reliable are large language models in analyzing the quality of written lesson plans? A mixed-methods study from a teacher internship program
0
Zitationen
2
Autoren
2025
Jahr
Abstract
This study investigates the reliability of Large Language Models (LLMs) in evaluating the quality of written lesson plans from pre-service teachers. A total of 32 lesson plans, each ranging from 60 to 100 pages, were collected during a teacher internship program for civic education pre-service teachers. Using the ChatGPT-o1 reasoning model, we compared a human expert standard with LLM coding outcomes in a two-phase explanatory sequential mixed-methods design that combined quantitative reliability testing with a qualitative follow-up analysis to interpret inter-dimensional patterns of agreement. Quantitatively, overall reliability across six qualitative components of written lessons plans (Content Transformation, Task Creation, Adaptation, Goal Clarification, Contextualization and Sequencing ) reached a moderate alignment in identifying explicit instructional features (α = .689; 73.8% exact agreement). Qualitative analyses further revealed that the LLM struggled with high-inferential criteria, such as the depth of pedagogical reasoning and the coherence of instructional decisions, as it often relied on surface-level textual cues rather than deeper contextual understanding. These findings indicate that LLMs can support teacher educators and educational researchers as a design-stage screening tool, but human judgment remains essential for interpreting complex pedagogical constructs in written lesson plans and for ensuring the ethical and pedagogical integrity of evaluation processes. We outline implications for integrating LLM-based analysis into teacher education and emphasize improved prompt design and systematic human oversight to ensure reliable qualitative use. • Focus on reliability of LLMs rating of written lesson plans (CODE-PLAN) • Explanatory sequential design: quantitative α, then follow-up qualitative analysis • LLMs align with experts on explicit, surface-identifiable plan features • Lower reliability on high-inference pedagogy and instructional coherence • Practice guidance: LLM-supported mentor–mentee feedback workflow
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.