Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large language models and questions from older adults: a human and machine-based evaluation study
1
Zitationen
8
Autoren
2025
Jahr
Abstract
Large language models (LLMs) hold the potential to offer substantial advantages in information generation and comprehension. This study seeks to evaluate the extent to which these models can effectively meet the needs of older adults by examining responses to 23 questions posed to ChatGPT, Google Gemini, Claude AI, Microsoft Copilot, and Google Search. The responses were evaluated based on their accuracy, comprehensibility, relevance, and conciseness. Both the LLMs and a panel of seven researchers assessed the answers. ChatGPT received the highest ratings for accuracy and comprehensibility, Google Gemini for conciseness, and both ChatGPT and Claude AI were rated highest for reliability. These ratings were further analysed to compare the performance of the LLMs with that of the researchers. The LLMs generally awarded higher ratings of 4 or 5 most of the time whereas the ratings of the researchers were more varied. Microsoft Copilot most closely aligned with the researchers’ evaluations of accuracy and comprehensibility, while Claude AI and ChatGPT showed the closest alignment for conciseness and relevance, respectively. Furthermore, to identify which platform may be best suited for different types of information, the questions were divided into five categories, with ChatGPT emerging as the best suited LLM in most categories. These findings, along with the rubric and research methodologies utilised in this study, can be replicated to assess the performance of LLMs across different research areas and domains.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.