OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 22.05.2026, 20:11

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Exploring Security Vulnerabilities in ChatGPT Through Multi-Technique Evaluation of Resilience to Jailbreak Prompts and Defensive Measures

2024·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2024

Jahr

Abstract

Large Language Models (LLMs), such as GPT-3.5, GPT-4, Bard, and LLaMa, have demonstrated remarkable pro-ficiency in natural language processing, exhibiting human-like fluency and understanding across multiple languages. However, recent incidents have underscored concerns regarding their potential misuse, including dissemination of misinformation, hate speech, conspiracy theories, and even exploitation as hacking tools. Central to these security concerns is the concept of jailbreaking, which involves circumventing the predefined security measures of LLMs to manipulate their output. Various techniques, such as Adversarial attacks, cross-language attacks, and the novel method DeepInception, exploit vulnerabilities in LLMs to induce them into generating harmful content. These methods highlight the need for robust defenses against malicious exploitation of language models. In this paper, we present an experimental study aimed at evaluating the effectiveness of jailbreak prompts in bypassing content restrictions across various forbidden scenarios. Our methodology involves assessing the success rate of jailbreak prompts in breaching content restrictions and analyzing the sentiment of generated responses. Through this experiment, we aim to quantify the efficacy of jailbreak prompts in circumventing security measures, thereby providing insights into the risks associated with misusing LLMs. We acknowledge potential limitations, including prompt effectiveness, model biases, and ethical considerations, and discuss implications for the ethical usage of AI language models. Our study encompasses a range of jailbreak prompts, including the DAN prompt, Deep Inception, AIM prompt, Tom and Jerry Prompt, and Hypothetical Response Prompt, each offering unique insights into the challenges and opportunities in securing LLMs against malicious exploitation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AIAdversarial Robustness in Machine Learning
Volltext beim Verlag öffnen