Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs

2024·297 Zitationen·npj Digital MedicineOpen Access

Volltext beim Verlag öffnen

297

Zitationen

Autoren

2024

Jahr

Abstract

The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs' pertinent theoretical knowledge from computer science to their application in clinical medicine is crucial. Prompt engineering has shown potential as an effective method in this regard. To explore the application of prompt engineering in LLMs and to examine the reliability of LLMs, different styles of prompts were designed and used to ask different LLMs about their agreement with the American Academy of Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared the consistency of the findings with guidelines across different evidence levels for different prompts and assessed the reliability of different prompts by asking the same question 5 times. gpt-4-Web with ROT prompting had the highest overall consistency (62.9%) and a significant performance for strong recommendations, with a total consistency of 77.5%. The reliability of the different LLMs for different prompts was not stable (Fleiss kappa ranged from -0.002 to 0.984). This study revealed that different prompts had variable effects across various models, and the gpt-4-Web with ROT prompt was the most consistent. An appropriate prompt could improve the accuracy of responses to professional medical questions.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingBiomedical Text Mining and Ontologies

Volltext beim Verlag öffnen

Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen