Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
1
Zitationen
9
Autoren
2024
Jahr
Abstract
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophisticated AI systems, namely, ChatGPT, Gemini, and Perplexity when applied to an examination focused on knowledge regarding research publication. Methods Three AI systems (ChatGPT-3.5, Gemini, and perplexity) were evaluated using an examination of fifty multiple-choice questions covering various aspects of research, including research terminology, literature review, study design, research writing, and publication-related topics. The questions were written by a researcher with an h-index of 22, and it was later tested on two other researchers with h-indices of 9 and 10 in a double-blinded manner and revised extensively to ensure the quality of the questions before testing them on the three mentioned AI systems. Results In the examination, ChatGPT scored 38 (76%) correct answers, while Gemini and Perplexity each scored 36 (72%). Notably, all AI systems frequently chose correct options significantly: ChatGPT chose option (C) correctly 88.9% of the time, Gemini accurately selected option (D) 78.9% of the time, and Perplexity correctly picked option (C) 88.9% of the time. In contrast, other AI tools showed minor agreement, lacking statistical significance, while ChatGPT exhibited significant concordance (81-83%) with researchers' performance. Conclusion ChatGPT, Gemini, and Perplexity perform adequately overall in research-related questions, but depending on the AI in use, improvement is needed in certain research categories. The involvement of an expert in the research publication process remains a fundamental cornerstone to ensure the quality of the work. Introduction The work of John McCarthy is the foundation of modern artificial intelligence (AI) research. In 1956, at Dartmouth College, he introduced the phrase "artificial intelligence," marking the inception of formal AI research [1]. The emergence of AI was an innovative technological frontier, promising transformative impacts across diverse sectors. Recent years have witnessed significant strides in the AI domain, particularly in the refinement of chatbot technology. An increasingly prevalent notion suggests that AI, having surpassed human capabilities in several domains, holds promise for substantial advancements in the realm of research publications. AI stands poised to augment research writing, the accuracy of information retrieved, and referencing, thereby potentially revolutionizing the field [2]. Over the past few years, a multitude of AI tools have become readily accessible, providing a diverse array of services and functionalities. A notable instance of such an AI system is ChatGPT, an advanced language model crafted by OpenAI. It underwent training using a vast array of textual materials gathered from websites, literature, and diverse sources, engaging in language modeling tasks to enhance its capabilities. This attribute sets it apart as one of the most expansive and resilient language models ever devised, integrating an astonishing 175 billion parameters [3,4]. An additional AI system that has attracted attention is Gemini, previously identified as Google Bard, which is an AI-driven information retrieval apparatus with a sophisticated chatbot that utilizes a "native multimodal" approach to effectively process and adjust to various types of data like video, audio, and text [5,6]. Perplexity AI stands as an AI-powered research and conversational search engine, adept at responding to queries through the utilization of natural language predictive text. It synthesizes answers from web sources, accompanied by citations through embedded links within the text response [7]. Many researchers are known to utilize chatbots as aids in their research endeavors. This study seeks to assess and contrast the performance of sophisticated AI systems—namely, ChatGPT, Gemini, and Perplexity—when applied to an examination focused on knowledge regarding research publication. It also aims to shed light on the current state of AI integration within the research publication process and identify opportunities for further development. Methods In this comparative investigation, we evaluated the performance of three distinct AI systems: ChatGPT-3.5, Gemini, and Perplexity. The assessment comprised 50 multiple-choice questions, each offering four options (A-D). The questions spanned various domains including eleven research terminology queries, six literature review inquiries, twelve study design probes, twelve research writing assessments, and nine publication-related investigations. Initially, a researcher with an h-index of 22, identified as the second author in the manuscript, composed a set of sixty multiple-choice questions. Subsequently, two other researchers with h-indices of 14 and 16, mentioned as authors seven and ten respectively, underwent the examination in a double-blinded fashion. Following this phase, all three researchers collaborated to review and analyze both questions and answers. Ten questions were excluded due to their lack of clarity, leaving a total of fifty questions selected for the final examination version. These selected questions were unanimously agreed upon by the researchers as informative indicators of knowledge within the realm of research and its associated intricacies. The questions were then uniformly inputted into each of the AI systems in March 2024, following a standardized protocol. This protocol involved initiating interactions with the AI systems by introducing a prompt starting with "Hello." Subsequently, each AI system received the same directive: "Please select the correct answer for the following multiple-choice questions." The questions were directly transcribed from a prepared Word document, and the AI-generated responses were recorded in an Excel spreadsheet. Statistical analysis was performed using Statistical Package for the Social Sciences (SPSS) version 27.0, with a significance level set at p < 0.05. Chi-square (Fisher's Exact Test) was employed for data analysis. During the literature review phase of the present study, papers were selectively included from reputable journals and omitted those published in predatory journals, adhering to the criteria delineated in Kscien’s list [8]. Results In the examination, ChatGPT demonstrated slightly higher accuracy with a total of 38 correct answers (76%), compared to 36 correct answers (72%) by both Gemini and Perplexity. Notably, Researcher 2 excelled in terminology and literature review questions, with 15 correct answers (88.23%), surpassing ChatGPT and Gemini, with 13 correct answers (76.47%). In research writing, Perplexity, along with Researcher 1 and Researcher 2, led with 10 correct responses (83.3%). Additionally, Researcher 1 exhibited the highest accuracy in research publication, with 9 correct responses (100%), outperforming ChatGPT and Researcher 2, who achieved 7 correct responses (77.78%) (Supplementary 1). In the examination comparing AI tools and two researchers' accuracy in identifying correct answers, researchers demonstrated superior accuracy compared to AI tools. For example, in questions where the correct answer was C, Researcher 2 achieved a perfect 100% accuracy, outperforming ChatGPT, Perplexity, and Gemini, which scored 88.9%, and 77.8% respectively. Notably, all AI systems significantly chosen the correct options. For instance, ChatGPT correctly identified option C 88.9% of the time, Gemini correctly chose option D 78.9% of the time, and Perplexity accurately selected option C 88.9% of the time (Table 1). Table 1. The association between correct answers and AI tools Correct ChatGPT A B C D Total A 7 (63.6%) 0 (0.0%) 2 (18.2%) 2 (18.2%) 11 (100%) B 0 (0.0%) 8 (72.7%) 2 (18.2%) 1 (9.1%) 11 (100%) C 0 (0.0%) 0 (0.0%) 8 (88.9%) 1 (11.1%) 9 (100%) D 0 (0.0%) 3 (15.8%) 1 (5.3%) 15 (78.9%) 19 (100%) Total 7 (14%) 11 (22%) 13 (26%) 19 (38%) 50 (100%) P-value <0.001 Correct Gemini A B C D Total A 7(63.6%) 2(18.2%) 1(9.1%) 1(9.1%) 11(100%) B 1(9.1%) 7(63.6%) 2(18.2%) 1(9.1%) 11(100%) C 0(0.0%) 0(0.0%) 7(77.8%) 2(22.2%) 9(100%) D 2(10.5%) 2(10.5%) 0(0.0%) 15(78.9%) 19(100%) Total 10(20%) 11(22%) 10(20%) 19(38%) 50(100%) P-value <0.001 Correct Perplexity A B C D Total A 8(72.7%) 0(0.0%) 1(9.1%) 2(18.2%) 11(100%) B 2(18.2%) 5(45.5%) 2(18.2%) 2(18.2%) 11(100%) C 0 (0.0%) 0 (0.0%) 8 (88.9%) 1 (11.1%) 9 (100%) D 0 (0.0%) 3 (15.8%) 1 (5.3%) 15 (78.9%) 19 (100%) Total 10 (20%) 8 (16%) 12 (24%) 20 (40%) 50 (100%) P-value <0.001 Correct Researcher 1 A B C D Total A 10 (90.9%) 0 (0.0%) 0 (0.0%) 1 (9.1%) 11(100%) B 0 (0.0%) 9 (81.8%) 0 (0.0%) 2 (18.2%) 11(100%) C 0 (0.0%) 1 (11.1%) 8 (88.9%) 0 (0.0%) 9(100%) D 0(0.0%) 2 (10.5%) 1 (5.3%) 16 (84.2%) 19 (100%) Total 10 (20%) 12 (24%) 9 (18%) 19 (38%) 50 (100%) P-value <0.001 Correct Researcher 2 A B C D Total A 10 (90.9%) 0 (0.0%) 0 (0.0%) 1 (9.1%) 11 (100%) B 1 (9.1%) 9 (81.8%) 1 (9.1%) 0 (0.0%) 11 (100%) C 0 (0.0%) 0 (0.0%) 9 (100%) 0 (0.0%) 9 (100%) D 2 (10.5%) 1 (5.3%) 3 (15.8%) 13 (68.4%) 19 (100%) Total 13 (26%) 10 (20%) 13 (26%) 14 (28%) 50(100%) P-value <0.001 In comparing AI tools and researchers' performance, significant agreement was noted with ChatGPT. For instance, out of 43 questions where researcher 1 agreed on the correct answer, ChatGPT agreed in 35 cases (81.4%) and disagreed in only 8 answers (18.6%). However, the comparison with the other two AI tools showed no significance but a slight alignment with the researchers' agreement on the correct answers (Table 2). Table 2. Comparative
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.