Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Model–Based Chatbots and Agentic AI for Mental Health Counseling: Systematic Review of Methodologies, Evaluation Frameworks, and Ethical Safeguards (Preprint)

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Large language model (LLM)–based chatbots have rapidly emerged as tools for digital mental health (MH) counseling. However, evidence on their methodological quality, evaluation rigor, and ethical safeguards remains fragmented, limiting interpretation of clinical readiness and deployment safety. </sec> <sec> <title>OBJECTIVE</title> This systematic review aimed to synthesize the methodologies, evaluation practices, and ethical or governance frameworks of LLM-based chatbots developed for MH counseling and to identify gaps affecting validity, reproducibility, and translation. </sec> <sec> <title>METHODS</title> We searched Google Scholar, PubMed, IEEE Xplore, and ACM Digital Library for studies published between January 2020 and May 2025. Eligible studies reported original development or empirical evaluation of LLM-driven MH counseling chatbots. We excluded studies that did not involve LLM-based conversational agents, were not focused on counseling or supportive MH communication, or lacked evaluable system outputs or outcomes. Screening and data extraction were conducted in Covidence (Veritas Health Innovation) following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines. Study quality was appraised using a structured traffic light framework across 5 methodological domains (design, dataset reporting, evaluation metrics, external validation, and ethics), with an overall judgment derived across domains. We used narrative synthesis with descriptive aggregation to summarize methodological trends, evaluation metrics, and governance considerations. </sec> <sec> <title>RESULTS</title> Twenty studies met the inclusion criteria. GPT-based models (GPT-2/3/4) were used in 45% (9/20) of studies, while 90% (18/20) used fine-tuned or domain-adaptation models such as LLaMa, ChatGLM, or Qwen. Reported deployment types were not mutually exclusive; standalone apps were most common (18/20, 90%), and some systems were also implemented as virtual agents (4/20, 20%) or delivered via existing platforms (2/20, 10%). Evaluation approaches were frequently mixed, with qualitative assessment (13/20, 65%), such as thematic analysis or rubric-based scoring, often complemented by quantitative language metrics (18/20, 90%), including BLEU (Bilingual Evaluation Understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), or perplexity. Quality appraisal indicated consistently low risk for dataset reporting and evaluation metrics, but recurring limitations were observed in external validation and reporting on ethics and safety, including incomplete documentation of safety safeguards and governance practices. No included study reported registered randomized controlled trials or independent clinical validation in real-world care settings. </sec> <sec> <title>CONCLUSIONS</title> LLM-based MH counseling chatbots show promise for scalable and personalized support, but current evidence is limited by heterogeneous study designs, minimal external validation, and inconsistent reporting of safety and governance practices. Future work should prioritize clinically grounded evaluation frameworks, transparent reporting of model and prompt configurations, and stronger validation using standardized outcomes to support safe, reliable, and regulatory-ready deployment. </sec> <sec> <title>CLINICALTRIAL</title> <p/> </sec>

Autoren

Themen

Digital Mental Health InterventionsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Large Language Model–Based Chatbots and Agentic AI for Mental Health Counseling: Systematic Review of Methodologies, Evaluation Frameworks, and Ethical Safeguards (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen