OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.04.2026, 14:05

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Suicide- and crisis-risk detection using large language models in mental-health chatbots

2026·0 Zitationen·Repository for Publications and Research Data (ETH Zurich)Open Access
Volltext beim Verlag öffnen

0

Zitationen

19

Autoren

2026

Jahr

Abstract

Objective Large language models (LLMs) are increasingly embedded in mental-health chatbots, yet safe deployment is limited by two unresolved challenges: (1) suicide- and crisis-risk detection lacks a definitive ground truth and is characterized by substantial clinician disagreement, and (2) most evaluations frame risk detection as an offline accuracy task rather than a real-time safety problem. This study aimed to empirically characterize these limitations and to derive design principles for uncertainty-aware, safety-oriented crisis detection in conversational artificial intelligence. Methods and Analysis We curated a clinician-labeled dataset of 200 real-world conversation segments drawn from a deployed mental-health chatbot. Five clinical experts independently annotated each segment for suicide- and crisis-related risk. Using a single base LLM, we implemented five prompt-defined detection variants with systematically increasing sensitivity thresholds, without task-specific training or fine-tuning. Models were evaluated against clinician consensus labels to quantify false-negative and false-positive trade-offs. Latency analyses assessed feasibility for real-time, per-turn monitoring. Results As sensitivity increased, the false-negative rate decreased monotonically from 87% to 0%, while false-positive rates rose accordingly. High- and extreme-sensitivity variants achieved near-perfect (98.9%) and perfect (100%) recall, demonstrating that near–zero–miss crisis detection from natural language is technically feasible in real time (mean latency <1 s). Importantly, model errors aligned closely with cases of clinician disagreement, indicating that misclassifications predominantly reflect irreducible uncertainty rather than model failure. Conclusion Suicide- and crisis-risk detection in conversational systems is inherently uncertain and should be reframed from an accuracy-oriented classification task toward an online, safety-oriented monitoring problem. Within this framing, near-zero-miss detection is achievable but necessarily incurs elevated false-positives, motivating architectural rather than purely model-level solutions. We propose an operational emergency mode in which conservative risk detection operates independently from the conversational model, allowing supportive engagement to be maintained under heightened safety constraints. This layered, uncertainty-aware architecture provides a practical pathway for safer deployment of LLM-based mental-health chatbots without reliance on large training datasets or extensive model optimization.

Ähnliche Arbeiten