Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

How reliable are large language models in analyzing the quality of written lesson plans? A mixed-methods study from a teacher internship program

2025·0 Zitationen·Computers and Education Artificial IntelligenceOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study investigates the reliability of Large Language Models (LLMs) in evaluating the quality of written lesson plans from pre-service teachers. A total of 32 lesson plans, each ranging from 60 to 100 pages, were collected during a teacher internship program for civic education pre-service teachers. Using the ChatGPT-o1 reasoning model, we compared a human expert standard with LLM coding outcomes in a two-phase explanatory sequential mixed-methods design that combined quantitative reliability testing with a qualitative follow-up analysis to interpret inter-dimensional patterns of agreement. Quantitatively, overall reliability across six qualitative components of written lessons plans (Content Transformation, Task Creation, Adaptation, Goal Clarification, Contextualization and Sequencing ) reached a moderate alignment in identifying explicit instructional features (α = .689; 73.8% exact agreement). Qualitative analyses further revealed that the LLM struggled with high-inferential criteria, such as the depth of pedagogical reasoning and the coherence of instructional decisions, as it often relied on surface-level textual cues rather than deeper contextual understanding. These findings indicate that LLMs can support teacher educators and educational researchers as a design-stage screening tool, but human judgment remains essential for interpreting complex pedagogical constructs in written lesson plans and for ensuring the ethical and pedagogical integrity of evaluation processes. We outline implications for integrating LLM-based analysis into teacher education and emphasize improved prompt design and systematic human oversight to ensure reliable qualitative use. • Focus on reliability of LLMs rating of written lesson plans (CODE-PLAN) • Explanatory sequential design: quantitative α, then follow-up qualitative analysis • LLMs align with experts on explicit, surface-identifiable plan features • Lower reliability on high-inference pedagogy and instructional coherence • Practice guidance: LLM-supported mentor–mentee feedback workflow

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationInnovative Teaching and Learning MethodsOnline Learning and Analytics

Volltext beim Verlag öffnen

How reliable are large language models in analyzing the quality of written lesson plans? A mixed-methods study from a teacher internship program

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen