Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of generative AI assistance in clinical nephrology: Assessing GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 in patient interaction and renal biopsy interpretation
2
Zitationen
20
Autoren
2025
Jahr
Abstract
Importance Compares the responses of four AI models to common nephrology-related questions encountered in clinical settings. Objective To evaluate generative AI models in enhancing nephrology patient communication and education. Design Generative AI in Nephrology Setting In a study conducted from December 8–12, 2023, and October 21–23, 2024, IT engineers evaluated GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 for nephrology patient communication and education, querying each with 21 nephrology questions and three renal biopsy reports, repeated for consistency. Intervention(s) (for clinical trials) or Exposure(s) (for observational studies) None. Main Outcome(s) and Measure(s) Fifteen nephrologists and one nephrology researcher assessed responses for Appropriateness, Helpfulness, Consistency, and human-like empathy, with rating scale (1–4). Using Shapiro–Wilk and Mann–Whitney U tests with Holm correction, along with TF-IDF, BertScore, and ROUGE were used. The study compared the performance of GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 across 24 nephrology-related questions. Results GPT-4o consistently achieved high scores in Appropriateness (3.39 ± 0.7) and Helpfulness (3.24 ± 0.73), while PaLM 2 demonstrated the highest consistency score (3.0 ± 0.86). In empathy, GPT-4 achieved the highest overall score (80.73%), excelling in patient-centric scenarios, followed by GPT-4o (76.56%). PaLM 2 showed competitive empathy in specific cases, despite scoring lower in consistency and Appropriateness. For Kidney-Related Queries, GPT-4o excelled in relevance metrics, achieving the highest BertScore (0.57) and ROUGE for one-word metrics (0.54). Gemini 1.0 Ultra led in generating coherent responses for Renal Biopsy Reports with the highest TF-IDF (0.56) and ROUGE for longest similar sentences (0.47). All 101 references provided by GPT-4 were 100% accurate. Conclusions and Relevance GPT-4o emerged as the most accurate and consistent model across most evaluation categories, while GPT-4 demonstrated superior empathy and balanced performance. PaLM 2 and Gemini 1.0 Ultra showed strengths in specific areas, highlighting the potential for tailored applications of generative AI in nephrology clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- China Medical University(TW)
- China Medical University Hospital(TW)
- National Chung Hsing University(TW)
- Taipei Medical University-Shuang Ho Hospital(TW)
- Asia University(TW)
- Taipei Medical University(TW)
- Keelung Chang Gung Memorial Hospital(TW)
- Chang Gung University(TW)
- Kaohsiung Chang Gung Memorial Hospital(TW)
- National Changhua University of Education(TW)