Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Development and validation of explainable risk assessment system for LLM-driven emotional support chatbot (Emobot) (Preprint)
0
Zitationen
8
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> Youth mental health needs are substantial: globally, about 1 in 7 adolescents (10–19 years) experience a mental disorder, and suicide is the third leading cause of death among people aged 15–29. Despite need, many young people delay or avoid care due to stigma, privacy concerns, cost, and difficulty accessing timely services. Chat-based support can lower the threshold for disclosure, but safety-critical statements require transparent risk stratification and clear pathways for clinician oversight. Language further affects safety in multilingual communities: Cantonese is comparatively under-resourced for natural language processing, and its colloquial orthography, idioms, and pervasive code-switching can distort risk-cue interpretation when models are trained primarily on English and standard written Chinese. </sec> <sec> <title>OBJECTIVE</title> We developed EmoBot, an explainable three-tier (Tier 1–3) risk stratification pipeline and nurse-/counsellor-facing dashboard for Cantonese-speaking youth, and evaluated agreement with expert triage labels and boundary error patterns. </sec> <sec> <title>METHODS</title> EmoBot uses a hybrid framework combining semantic exemplar retrieval and constrained large language model (LLM) reasoning. An expert-authored, de-identified Cantonese reference corpus is embedded with Sentence-BERT and indexed for nearest-neighbor retrieval (top-k=5). In parallel, the DeepSeek API chat model (deepseek-chat; DeepSeek-V3.2) generates a structured record (tier, category, cues, rationale) from the user message, tier/category definitions, and retrieved exemplars as in-context annotated examples. Inference settings were fixed (temperature=0.1; presence_penalty=0; max_tokens capped) and no fine-tuning was performed. If retrieval and LLM disagree, EmoBot outputs the higher tier while displaying provenance-linked evidence in the dashboard to support nurse-in-the-loop verification and escalation. Two sequential validation sets were used (set 1: n=50 pilot stress-test; set 2: n=336 expanded validation; total n=386). Three mental-health–trained experts rated each statement blinded to model output; majority consensus (≥2/3) defined the reference label. </sec> <sec> <title>RESULTS</title> Human majority consensus was achieved for 382/386 statements (99.0%). On consensus-labeled items (n=382), EmoBot matched expert tiers with 95.8% accuracy (366/382; 95% CI 93.7–97.6) and 95.2% macro F1 (95% CI 92.7–97.4). Performance increased from set 1 (80.9% accuracy) to set 2 (97.9% accuracy). All misclassifications were adjacent-tier (±1) with no Tier 1↔Tier 3 confusions; Tier 3 detection remained high (F1 94.7% and 98.6%). </sec> <sec> <title>CONCLUSIONS</title> Explainable, conservative hybrid triage with provenance-linked evidence can closely align with expert judgment for Cantonese youth help-seeking text while supporting nurse-in-the-loop review and escalation. The architecture (localized exemplar corpus + structured model outputs + auditable dashboard) is adaptable to other low-resource languages and multilingual settings where safe risk triage requires both linguistic localization and clinical accountability. </sec> <sec> <title>CLINICALTRIAL</title> Not applicable (nonrandomized expert validation study; no clinical trial registration required). </sec>
Ähnliche Arbeiten
Amazon's Mechanical Turk
2011 · 10.034 Zit.
The Transtheoretical Model of Health Behavior Change
1997 · 7.707 Zit.
COVID-19 and mental health: A review of the existing literature
2020 · 3.710 Zit.
Cognitive Therapy and the Emotional Disorders
1977 · 2.931 Zit.
Mental health problems and social media exposure during COVID-19 outbreak
2020 · 2.793 Zit.