OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.05.2026, 07:23

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing the performance of <scp>ChatGPT GPT</scp>‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists

2024·49 Zitationen·Psychiatry and Clinical NeurosciencesOpen Access
Volltext beim Verlag öffnen

49

Zitationen

11

Autoren

2024

Jahr

Abstract

AIM: Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied. METHOD: In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. RESULT: = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1). CONCLUSION: Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.

Ähnliche Arbeiten