Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Psychometrics and Linguistics in ChatGPT-Generated Reading Tests Compared with CET-4
0
Zitationen
4
Autoren
2026
Jahr
Abstract
ChatGPT has shown considerable potential for Automated Item Generation, but the quality of ChatGPT-generated items in language assessment remains insufficiently substantiated. This research recruited 121 participants to systematically compare the psychometric properties of the test items and the linguistic features of the reading passages in ChatGPT-generated and official CET-4 reading comprehension materials, using Item Response Theory and Coh-Metrix. Key findings are as follows: (1) generated items fell short in higher-order reading skills; (2) the generated items were less difficult than official ones, showing weaker discrimination and providing measurement information mainly for lower-performing students; (3) only 22.9% distractors functioned effectively, indicating insufficient distractor performance; and (4) ChatGPT-generated passages were characterized by irregular lexical distribution, higher lexical complexity, weaker cohesion but simpler sentences than CET-4 passages. Although ChatGPT-generated passages were less readable than CET-4 passages, the corresponding items were easier and showed lower discrimination. This discrepancy can be attributed to inadequate distractor functioning that facilitates option elimination without complete passage comprehension, as well as to the underrepresentation of higher-order reading skills. The findings corroborate the conclusion that ChatGPT may function effectively as a supplementary tool in low-stakes assessment; however, substantial refinements in item quality are imperative before its application in high-stakes testing.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.