Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations
2
Zitationen
2
Autoren
2024
Jahr
Abstract
Abstract Background Generative artificial intelligence (AI) technology has the revolutionary potentials to augment clinical practice and telemedicine. The nuances of real-life patient scenarios and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective application. Methods We present a protocol for the systematic evaluation of generative AI large language models (LLMs) as chatbots within the context of clinical microbiology and infectious disease consultations. We aim to critically assess the clinical accuracy, comprehensiveness, coherence, and safety of recommendations produced by leading generative AI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot. Discussion A standardised healthcare-specific prompt template is employed to elicit clinically impactful AI responses. Generated responses will be graded by a panel of human evaluators, encompassing a wide spectrum of domain expertise in clinical microbiology and virology and clinical infectious diseases. Evaluations are performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of generative AI in healthcare, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.561 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.452 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.