Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Growth of the Medical Chat Bot—The Teething Problems of Childhood

2024·1 Zitationen·Mayo Clinic Proceedings Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Similar to many of my colleagues in medicine, as a practicing cardiologist over several decades, few phenomena have produced as much excitement, hope for the future, curiosity, and palpable trepidation as the emergence of ChatGPT for health care application. In this issue of Mayo Clinic proceedings: digital health, Gravel et al,1Gravel J. D’Amours-Gravel M. Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions.Mayo Clinic Proceedings: Digital Health. 2023; 1: 226-234Abstract Full Text Full Text PDF Google Scholar have drawn attention to a major limitation using chat generative pretrained transformer (ChatGPT) to answer medical questions, specifically documenting frequent fabricated references for information provided. In their unique and insightful study, they asked ChatGPT medical questions with a subsequent prompt to provide corresponding references. These responses and references were to be numerically graded by experts for relevance, but a preponderance of the references provided were fabricated, precluding any meaningful scaling of appropriateness. Indeed, the issue of hallucinations and confident imaginative or fabricated responses from ChatGPT has been increasingly documented as a glaring pitfall for the growing use and development of a medical chat bot.2Alkaissi H. McFarlane S.I. Artificial hallucinations in ChatGPT: implications in scientific writing.Cureus. 2023; 15e35179Google Scholar, 3Siontis K.C. Attia Z.I. Asirvatham S.J. Friedman P.A. ChatGPT hallucinating: can it get any more humanlike? 2023.Eur Heart J. 2023; ehad766Crossref Google Scholar, 4Dave T. Athaluri S.A. Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations.Front Artif Intell. 2023; 61169595Crossref PubMed Scopus (80) Google Scholar To understand how such blatant misinformation can occur with programs, such as Chat GPT, we need to understand what Chat GPT is including, its forerunners and future versions, and the underlying lacunae that lead to fabricated responses.5Athaluri S.A. Manthena S.V. Kesapragada V.S.R.K.M. Yarlagadda V. Dave T. Duddumpudi R.T.S. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references.Cureus. 2023; 15e37432Google Scholar Chat generative pretrained transformer is not human and does not think like a human. However, its learning and knowledge processes closely resemble our own. The tactics used to train GPT-like models resemble parenting styles or how a teacher would correct a student. Thus, thinking about ChatGPT and its medical applications through the lens of developmental psychology may help us in health care grasp its tremendous potential and its glaring pitfalls. GPT-like transformer large language models (LLMs) are fancy versions of autocomplete. They take what has already been said and make a good guess at the next few words likely to follow, much like a child might do on a fill-in-the-blank worksheet. The untrained language model is given sentences and paragraphs with words removed, and the model typically makes a (bad) guess as to what the missing word is. Then, comparing its prediction to the true value, it figures out what edits to its mathematical thought process led to the wrong answer and adjusts accordingly. Over billions of trial runs on billions of sentences from all over the internet, it learns how to fill-in-the-blank words with incredible accuracy on almost any topic. Autocomplete, however, does not answer questions, follow instructions, or hold conversations as ChatGPT does with Elan. Indeed, a raw GPT model would not follow autocomplete-like instructions but rather simply list a sixth and seventh instruction for those natural text words after the original 5 instructions. The ChatGPT, however, is not entirely autocomplete, nor is it just predictive artificial intelligence (AI). It is said to be generative, meant to create text that brings together and makes sense of multiple sentences, paragraphs, and in some instances, contextually related information.6Mason G.P. ChatGPT Is Not Autcomplete.We Don’t Know What It Is. 2023; 2023Google Scholar,7Wolfram S. What is ChatGPT doing.…and why does it work?.Stephen Wolfram Writings. 2023; Google Scholar To turn this autocomplete into a conversational interlocutor, instruction tuning is performed. Essentially, a large number of humans are hired to ask GPT questions, tell it things they want done, and have a conversation with it. The GPT provides them a plethora of possible answers. The humans then rank the best answer to the worst, and over time, this iterative process fine-tunes the model into an instruction- following, and conversational machine. Thus, raw and autocomplete GPT becomes ChatGPT.8Ali R. Tang O.Y. Connolly I.D. et al.Performance of ChatGPT and GPT-4 on neurosurgery written board examinations.Neurosurgery. 2023; 93: 1353-1365Crossref Scopus (10) Google Scholar,9Vaswani A. Shazeer N. Parmar N. et al.Attention is all you need. 30. Adv Neural Inf Process Syst, 2017Google Scholar Chat generative pretrained transformers can become the perfect book-smart expert in any medical field once it has read every medical article ever published in every medical journal. However, it does not remember everything. Just as with a human, its internal memory is limited by the number of synapses in its neural network. Although the knowledge base would be orders of magnitude greater than any physician could meaningfully acquire, its memory limitations and appreciation of relevance are significant. However, for all the relative ignorance for us in the medical field, we do not hallucinate or fabricate information similar to ChatGPT has documented to do.2Alkaissi H. McFarlane S.I. Artificial hallucinations in ChatGPT: implications in scientific writing.Cureus. 2023; 15e35179Google Scholar,3Siontis K.C. Attia Z.I. Asirvatham S.J. Friedman P.A. ChatGPT hallucinating: can it get any more humanlike? 2023.Eur Heart J. 2023; ehad766Crossref Google Scholar If one presses ChatGPT for information it does not know or logic it cannot provide, it will often make something up entirely, fabricating references, creating pretend medical procedures, and lying with a straight face with apparent gusto to match. Although humans lie as well, medical students and those in health care are mature enough to avoid it and to know and accept when they are beaten. However, young children completely lack that sense. As parents will know, children have the capability to lie in extraordinary and fanciful ways, fabricating people and creating entirely new laws of physics to explain every possible way their little sister scraped a knee! It is therefore not coincidental that parental-like tactics to stop children lying are akin to the procedures now being employed to reign in GPT hallucinations. Some of the underlying reasons for ChatGPT fabrication are fixed as we speak, others fixable, but yet others challenging. The early release version of ChatGPT used in this study will tend to fabricate if asked about anything after August 2021, which is the cutoff for its internet training data. In addition, fabrication is a risk for any question too specific, for example, the senior author of this study may not have had a sufficient number of previously existing references to be recognized by ChatGPT’s limited synopsis and potentially a false biography could have resulted.1Gravel J. D’Amours-Gravel M. Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions.Mayo Clinic Proceedings: Digital Health. 2023; 1: 226-234Abstract Full Text Full Text PDF Google Scholar The different styles for publishing citations in various journals, books, online articles, etc may make it challenging to unite all potential, relevant references to a specific context and the question at hand. The solution for this lack of knowledge is straightforward—increasing access to relevant knowledge for whatever is being discussed, ie, connection to the internet. Indeed, all the newest LLMs (Google Bard, Microsoft Bing, and GPT-4) already go a long way in solving the problem by allowing the model to query relevant information from a search engine and use that as a basis for responses. This is the anticipated standard for ChatGPT-like models going forward. Chat generative pretrained transformers and related generative AI have already begun to be used in the medical profession performing routine medical tasks. Important ethical considerations relating to the gaps in knowledge, the presence of humans in medical decision-making, and interpersonal health care and the effects of these agents on the jobs of health care workers are central to the ongoing discussions of generative AI’s use in caring for patients.10Brin D. Sorin V. Vaid A. et al.Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.Sci Rep. 2023; 1316492Crossref Scopus (3) Google Scholar, 11Katz D.M. Bommarito M.J. Gao S. Arredondo P. GPT-4 passes the.Bar Exam. 2023; 2023Google Scholar, 12Rosoł M. Gąsior J.S. Łaba J. Korzeniewski K. Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish medical final examination.Sci Rep. 2023; 1320512Crossref Scopus (0) Google Scholar, 13Toyama Y. Harigai A. Abe M. et al.Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan radiology society. Preprint. Posted online October 4, 2023.Jpn J Radiol. 2023; https://doi.org/10.1007/s11604-023-01491-2Crossref Scopus (2) Google Scholar Why else do kids lie? Often, they are overconfident in their abilities. They think they know more than they do and lack the barometer to accurately gauge their capabilities. The ChatGPT and its language model cousins have similar teething problems. It is far easier to train a model to know new facts or vocabulary than to teach it an understanding of whether the model itself has mastered a concept or possesses a very basic grasp of it. Solving this issue is more difficult but not impossible. Present efforts are placed in creating new benchmarks for success when training language models. Instead of simply filling in the blank appropriately, reward points are given if the models accurately present their level of understanding of the subject when writing their sentences. Demerit points are given if the model is dramatically over or underconfident with its data presentation. These are then used as feedback to the model’s mathematics making its learning better next time. Indeed, parents teach in a similar way, rewarding a child for a strong degree of self-understanding while explaining the risks and faults of overconfidence when reported. Implicit to why kids lie and physicians (hopefully) do not is something more fundamental. Children have little sense of anything beyond the task at hand. They are shortsighted and live in a bubble. They take 1 task or 1 story and devote all their attention to it, ignoring the bigger picture. The ChatGPT in its present version is similar, and it’s largely our own fault in its method of creation. When we talk to a language model, we expect answers. The ChatGPT picks up on that and is eager to please. So, it will go to any extent to provide you with a solution to your problem. If it does not know a solution, it will fabricate one. It does not grasp the broader consequences of this approach and does not understand the ramifications. It is simply doing what you want it to do in that moment—answer your question with a plausible and human-like response. This issue represents a more difficult problem to solve, but much progress is being made. The process by which humans are hired to ask questions and rank responses (instruction tuning) is being extended to ensure that models avoid overconfidence and fabrication in situations where it may be dangerous (ethics tuning). Some newer models adopt a constitutional framework where certain values and rules are laid out for the model to follow. Essentially, this is an ethics handbook to guide the model to speak when it knows but to hold its tongue when it doesn’t (within degrees of certainty). Fine-tuning a model to a specific task or exposing it to a curated corpus of medical literature eventually will dramatically increase a model’s awareness of its own limitations. A different approach where we, the users, can greatly reduce hallucinations until such global knowledge and awareness is established is by asking the right questions. This need for the correct prompt itself has spawned a dramatic rise in technology targeting and facilitating appropriate prompting for LLMs. If we go in cognizant that we are dealing with an entity that wants to do whatever we ask of it, then specifying guardrails including perhaps limiting references to leading established journals, asking follow-ups, and questioning the model as to its confidence in its answers will help. Undoubtedly and eventually, these strategies will not be necessary, but they represent an effective stopgap. Most in health care recognize and believe in the transformative potential for GPT-like models in medicine. They can be pocket diagnostic tools for the remotest of patients with poor access to care and in addition, answer personalized medical questions for everyday people. Indeed, they may 1 day have the capacity to do complete procedures, but not today. For now, problems like hallucinations and fabrication well-highlighted by Gravel et al1Gravel J. D’Amours-Gravel M. Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions.Mayo Clinic Proceedings: Digital Health. 2023; 1: 226-234Abstract Full Text Full Text PDF Google Scholar are major roadblocks for contemporary use with solutions being implemented and others on their way.

Growth of the Medical Chat Bot—The Teething Problems of Childhood

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen