Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-driven speech act annotation: accuracy and reproducibility across ChatGPT, LadderWeb and LLaMA
0
Zitationen
3
Autoren
2026
Jahr
Abstract
This study evaluates three machine learning systems for annotating pragmatic categories, focusing on cancellations after accepting an invitation. The systems include the supervised model LadderWeb and the pre-trained models ChatGPT-4o and LLaMA-3.2. LadderWeb, built on Apache OpenNLP, was specifically designed for cancellation annotation. ChatGPT-4o was tested through a web interface to simulate non-expert use, while LLaMA-3.2 was run locally to ensure control, reproducibility, and data security. Both large language models were prompted using a few-shot learning approach (Brocca et al., in review). System outputs were compared against a human baseline. GPT achieved the highest agreement across dimensions, with κ values ranging from substantial to almost perfect. LadderWeb also showed substantial agreement, whereas LLaMA performed considerably worse. Repeated testing after seven months revealed that GPT’s results varied, though accuracy remained high, while LadderWeb and LLaMA produced self-consistent outputs. Notably, LLaMA improved when parameters were adjusted. These findings highlight the potential of pre-trained large language models such as ChatGPT-4o to support pragmatic corpus annotation, while also emphasizing their reproducibility challenges—an issue not observed with LadderWeb or LLaMA.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.349 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.219 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.631 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.480 Zit.