Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Are Large Language Model–Based Chatbots Effective in Providing Reliable Medical Advice for Achilles Tendinopathy? An International Multispecialist Evaluation
1
Zitationen
16
Autoren
2025
Jahr
Abstract
Background: Large language model (LLM)-based chatbots have shown potential in providing health information and patient education. However, the reliability of these chatbots in offering medical advice for specific conditions like Achilles tendinopathy remains uncertain. Mixed outcomes in the field of orthopaedics highlight the need for further examination of these chatbots' reliability. Hypothesis: Three leading LLM-based chatbots can provide accurate and complete responses to inquiries related to Achilles tendinopathy. Study Design: Cross-sectional study. Methods: Eighteen questions derived from the Dutch clinical guideline on Achilles tendinopathy were posed to 3 leading LLM-based chatbots: ChatGPT 4.0, Claude 2, and Gemini. The responses were incorporated into an online survey assessed by orthopaedic surgeons specializing in Achilles tendinopathy. Responses were evaluated using a 4-point scoring system, where 1 indicates unsatisfactory and 4 indicates excellent. The total scores for the 18 responses were aggregated for each rater and compared across the chatbots. The intraclass correlation coefficient was calculated to assess consistency among the raters' evaluations. Results: < .001 for both comparisons). Intraclass correlation coefficients indicated poor reliability for ChatGPT 4.0 (0.420) and moderate reliability for Claude 2 (0.522) and Gemini (0.575). Conclusion: While LLM-based chatbots such as ChatGPT 4.0 can deliver high-quality responses to queries regarding Achilles tendinopathy, the inconsistency among specialist evaluations and the absence of standardized assessment criteria significantly challenge our ability to draw definitive conclusions. These issues underscore the need for a cautious and standardized approach when considering the integration of LLM-based chatbots into clinical settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.635 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.543 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.051 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.844 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.