Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge
4
Zitationen
3
Autoren
2023
Jahr
Abstract
Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69% of the samples. In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10% false positive rate, which is an improvement of 34% over the baseline of 0.301.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.592 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.808 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.686 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.449 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.631 Zit.