Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reply

2024·0 Zitationen·Journal of Allergy and Clinical Immunology GlobalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

We appreciate the constructive feedback provided by Kleebayoon and Wiwanitkit 1 on our recent article, ''Evaluating the scientific reliability of ChatGPT as a source of information on asthma.'' 2 Their insights raise important considerations on the evaluation framework for assessing AI-driven models in health care, particularly the need for a robust and multifaceted approach.First, we acknowledge their concerns regarding potential biases in question selection.Although thematic diversity is essential in fully evaluating any information source, the aim of our study was not to assess ChatGPT across an exhaustive set of topics but rather to examine its reliability in answering typical patient queries about asthma.These questions were carefully selected on the basis of frequency and relevance in clinical settings, ensuring that our assessment focused on information that ChatGPT is likely to provide to users.Expanding thematic scope is an interesting avenue for future research, yet for practical assessment of patient-pertinent information, we believe that our design is well justified.Regarding the evaluation methodology, our accuracy rating scale was intentionally chosen for simplicity and ease of interpretation.Although we recognize that such a scale may not capture the full complexity of medical information, it remains a widely used metric in AI evaluation studies. 3-6Future work could indeed benefit from integrating more detailed qualitative data to capture nuances in response quality.We also share the interest that Kleebayoon and Wiwanitkit 1 have expressed in ensuring consistency across raters.We used a panel of 5 professionals with expertise in asthma care who independently evaluated responses, achieving moderate agreement as measured by the Fleiss kappa (k 5 0.42; P < .001).However, further standardization could enhance interrater reliability.In response to the suggestion calling for follow-up question analysis, we emphasize that our study's primary objective was to assess the performance of ChatGPT on first-response accuracy, as initial responses are most critical in health care information settings.Introducing follow-up interactions, although insightful, would shift the focus from the ability of ChatGPT to provide reliable primary answers to a more complex analysis of AI response progression.This approach was beyond our study's scope, which prioritized assessing initial accuracy and the potential for patient comprehension.Nonetheless, future studies may indeed benefit from evaluating response sequences to examine depth and coherence over extended interactions.Additionally, we support the recommendation of Kleebayoon and Wiwanitkit 1 calling for exploring improvements in model training, such as creating a feedback loop for continuous

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

Reply

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen