Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Reply
0
Zitationen
2
Autoren
2024
Jahr
Abstract
We appreciate the constructive feedback provided by Kleebayoon and Wiwanitkit 1 on our recent article, ''Evaluating the scientific reliability of ChatGPT as a source of information on asthma.'' 2 Their insights raise important considerations on the evaluation framework for assessing AI-driven models in health care, particularly the need for a robust and multifaceted approach.First, we acknowledge their concerns regarding potential biases in question selection.Although thematic diversity is essential in fully evaluating any information source, the aim of our study was not to assess ChatGPT across an exhaustive set of topics but rather to examine its reliability in answering typical patient queries about asthma.These questions were carefully selected on the basis of frequency and relevance in clinical settings, ensuring that our assessment focused on information that ChatGPT is likely to provide to users.Expanding thematic scope is an interesting avenue for future research, yet for practical assessment of patient-pertinent information, we believe that our design is well justified.Regarding the evaluation methodology, our accuracy rating scale was intentionally chosen for simplicity and ease of interpretation.Although we recognize that such a scale may not capture the full complexity of medical information, it remains a widely used metric in AI evaluation studies. 3-6Future work could indeed benefit from integrating more detailed qualitative data to capture nuances in response quality.We also share the interest that Kleebayoon and Wiwanitkit 1 have expressed in ensuring consistency across raters.We used a panel of 5 professionals with expertise in asthma care who independently evaluated responses, achieving moderate agreement as measured by the Fleiss kappa (k 5 0.42; P < .001).However, further standardization could enhance interrater reliability.In response to the suggestion calling for follow-up question analysis, we emphasize that our study's primary objective was to assess the performance of ChatGPT on first-response accuracy, as initial responses are most critical in health care information settings.Introducing follow-up interactions, although insightful, would shift the focus from the ability of ChatGPT to provide reliable primary answers to a more complex analysis of AI response progression.This approach was beyond our study's scope, which prioritized assessing initial accuracy and the potential for patient comprehension.Nonetheless, future studies may indeed benefit from evaluating response sequences to examine depth and coherence over extended interactions.Additionally, we support the recommendation of Kleebayoon and Wiwanitkit 1 calling for exploring improvements in model training, such as creating a feedback loop for continuous
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.336 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.207 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.607 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.476 Zit.