OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 29.03.2026, 00:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT does not replicate human moral judgments: the importance of examining metrics beyond correlation to assess agreement

2025·2 Zitationen·Scientific ReportsOpen Access
Volltext beim Verlag öffnen

2

Zitationen

7

Autoren

2025

Jahr

Abstract

The rise of generative artificial intelligence has prompted claims that large language models (LLMs) can substitute for human participants, particularly in moral judgment tasks where correlations between ChatGPT and humans approach r = 1.00. In response, we conducted a pre-registered study where two LLMs (text-davinci-003 and GPT-4o) predicted human moral judgments of 60 scenarios prior to a large human sample (N = 940) rating them. Despite strong correlations, difference scores revealed substantial, systematic errors: Compared to humans, LLMs provided more extreme morality ratings of moral and neutral scenarios and more extreme immorality ratings of immoral ones. Moreover, ChatGPT differed significantly and with moderate to large effect sizes from human averages on ~ 87% of scenarios. Further, LLM ratings clustered around a restricted number of values, failing to reflect human variability. Re-examination of earlier published data also reflected this clumping. We conclude that broader evaluation criteria are needed for comparing LLM predictions and human responses in moral reasoning tasks.

Ähnliche Arbeiten