Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Exploring Security Vulnerabilities in ChatGPT Through Multi-Technique Evaluation of Resilience to Jailbreak Prompts and Defensive Measures
0
Zitationen
3
Autoren
2024
Jahr
Abstract
Large Language Models (LLMs), such as GPT-3.5, GPT-4, Bard, and LLaMa, have demonstrated remarkable pro-ficiency in natural language processing, exhibiting human-like fluency and understanding across multiple languages. However, recent incidents have underscored concerns regarding their potential misuse, including dissemination of misinformation, hate speech, conspiracy theories, and even exploitation as hacking tools. Central to these security concerns is the concept of jailbreaking, which involves circumventing the predefined security measures of LLMs to manipulate their output. Various techniques, such as Adversarial attacks, cross-language attacks, and the novel method DeepInception, exploit vulnerabilities in LLMs to induce them into generating harmful content. These methods highlight the need for robust defenses against malicious exploitation of language models. In this paper, we present an experimental study aimed at evaluating the effectiveness of jailbreak prompts in bypassing content restrictions across various forbidden scenarios. Our methodology involves assessing the success rate of jailbreak prompts in breaching content restrictions and analyzing the sentiment of generated responses. Through this experiment, we aim to quantify the efficacy of jailbreak prompts in circumventing security measures, thereby providing insights into the risks associated with misusing LLMs. We acknowledge potential limitations, including prompt effectiveness, model biases, and ethical considerations, and discuss implications for the ethical usage of AI language models. Our study encompasses a range of jailbreak prompts, including the DAN prompt, Deep Inception, AIM prompt, Tom and Jerry Prompt, and Hypothetical Response Prompt, each offering unique insights into the challenges and opportunities in securing LLMs against malicious exploitation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.