Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Medical QA dialogue datasets in RAG systems performance evaluation and ChatGPT optimization

2025·2 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study evaluates the effectiveness of Chinese doctor-patient dialogues as retrieval sources for Retrieval-Augmented Generation (RAG) in clinical question answering. Using ChatGPT-3.5 as a baseline and extending to GPT-4o and GPT-5, we compare multiple retrieval pipelines, including dense retrieval, Cross-Encoder reranking, Reciprocal Rank Fusion (RRF), and Cascade RRF→Rerank. Experimental results show that dialogue-based retrieval significantly improves generation quality relative to direct prompting (e.g., ROUGE-1-f: +12.6%, BERTScore_F1: +1.5%, p < 0.05). Among retrieval strategies, Rerank-only provides the best accuracy-latency balance, while the cascade pipeline introduces noise and yields no additional benefit. Under identical retrieval settings, GPT-4o achieves stronger automatic metrics and 4-5× lower latency, whereas GPT-5 receives slightly higher human preference scores (+ 0.08, p < 0.001), indicating a trade-off between efficiency and perceived coherence. Expert evaluation further confirms improvements in readability, accuracy, and authenticity (all p < 0.001). These findings highlight that data representation and metadata structure have a greater impact on RAG performance than retrieval algorithm complexity, offering practical guidance for reliable medical QA deployment.

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationMultimodal Machine Learning Applications

Volltext beim Verlag öffnen

Medical QA dialogue datasets in RAG systems performance evaluation and ChatGPT optimization

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen