Attacks and Defenses (Watermarking) in Large Language Models

Table of Contents

1. Attacks

  • Thieves on Sesame Street! Model Extraction on BERT-based APIs
    • model extraction with queries which are constructed by randomly vocabulary sampling.
    • [Supervised Cross-entropy][BERT][Classification]<ICLR’20>
  • Revealing Secrets From Pre-trained Models
    • vendor-based attack for both architecture and parameter revealing.
    • [BERT][Classification][‘22]
  • Extracted BERT Model Leaks More Information than You Think!
    • attribute inference attack, to obtain privacy information from extracted model.
    • [BERT][Classification][‘22]
  • Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
    • ensemble victim models surpass each of them.
    • [BERT][CLS][‘22]
  • Stealing the decoding algorithms of language models
    • [GPT-3][generation-decoding-strategy][ccs’23]
  • Watermark Stealing in Large Language Models
    • stealing watermarks
    • [LMs][generation][‘24]
  • On extracting Specialized Code Abilities from LLMs: A Feasibility Study
    • code ability extraction of LLMs
    • [LLMs][generation][ICSE‘23]

2. Defenses

  • SSL Guard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders
    • attack and defenses in contrastive learning.
    • [similarity][Representation Model][watermark]<’22>
  • CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
    • post-hoc watermarking to warp the word’s conditional distribution but have no change to the world’s distribution, to achieve a stealthy watermark
    • [generation][GPT-2][watermark]<NeurIPS’22>
  • Distillation-Resistant Watermarking for Model Protection in NLP
    • add sinusoidal noises to the probabilities of the prediction.
    • [classification][watermark][2022]
  • Protecting Intellectual Property of LLM-based Code Generation APIs via Watermarks
    • by synonyms, in code generation
    • [CCS’23]
  • Are you copying my model? Protecting the copyright of LLMs for EaaS via Backdoor Watermark
    • add the embeddings to the representation models as the backdoor.
    • [embedding model (representation model)][ACL’23]
  • A watermark for large language models
    • pre-set a green-word set in prompts, and use it to generate responses with watermark.
    • [generation][OPT][‘23]
  • Protecting Language Generation Models via Invisible Watermarking
    • Similar to the third paper, but in NLG this time. Require the probabilities of the suspected model.
    • [ICML’23]

Author: Zi Liang (zi1415926.liang@connect.polyu.hk) Create Date: Sun Mar 10 14:43:55 2024 Last modified: 2024-03-10 Sun 14:49 Creator: Emacs 28.1 (Org mode 9.5.2)