Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Retrieval-Augmented LLMs with Indonesian Clinical Trials Guidelines: A Comparative Study
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Artificial Intelligence (AI), especially Large Language Model (LLM) powered chatbots, has emerged as a significant tool in our daily lives. Many industries, including the healthcare industry, nowadays use AI chatbots for their operational tasks. One of such tasks is for diagnosing diseases based on the patient’s symptoms. However, in practice, LLM tends to generate incorrect information, so they need to be provided with more specific medical-related knowledge. Therefore, this study aims to enhance LLM diagnostic correctness by integrating Indonesian Clinical Trials Guidelines using RAG (Retrieval-Augmented Generation) to give the LLMs more context on medical cases. After that, we compare the diagnostic ability of three different RAG-enhanced popular LLMs (GPT, Deepseek, and Qwen). As a contrast, we also compare Deepseek model with and without RAG to test whether RAG leads to significant improvement for diagnosis correctness. Our research utilizes context-related metrics with ground truth from a general medical practitioner for evaluation. The result reveals that RAG significantly improves model performance and model type does not affect diagnosis correctness. However, from the mean alone, GPT has the highest performance with all metrics greater than 0.65, followed by Deepseek with RAG, Qwen, and Deepseek without RAG, but these differences were found to be not statistically significant. These findings further suggest that integrating local clinical guidelines with RAG can enhance medical reasoning in LLM, regardless of the underlying models, indicating that, as long as a reliable medical context is given, model type has a more limited impact on performance.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.