Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT GCG algorithm and AML pipeline #381

Merged
merged 151 commits into from
Oct 9, 2024

Conversation

blakebullwinkel
Copy link
Contributor

@blakebullwinkel blakebullwinkel commented Sep 18, 2024

Description

This PR adds the GCG algorithm, which can be used to generate adversarial suffixes that jailbreak language models. The GCG code is mostly the same as that in the original repo, with a few changes including:

  • Added support for Phi-3-mini
  • Logging using mlflow (functions added to pyrit/auxiliary_attacks/gcg/experiments/log.py)

This code been added to a new high-level directory pyrit/auxiliary_attacks, which will contain the code for attacks that do not fit into the core PyRIT functionality. This PR also adds a doc/code/auxiliary_attacks directory with two demo notebooks:

  • 1_auxiliary_attacks.ipynb: explains what we mean by "auxiliary attacks" and shows how to send a GCG suffix to a Phi-3-mini AzureMLChatTarget
  • 2_gcg.ipynb: provides the full AML pipeline to generate GCG suffixes including environment creation and job submission

Several additional dependencies are required to run the GCG algorithm. These have been added as optional dependencies under gcg in the pyproject.toml. We also update the github build pipeline to install these dependencies along with the dev dependencies so that we can continue to support this functionality in the future.

Tests and Documentation

Given that the GCG code runs independently from the rest of PyRIT, we decided not to add tests for it. Documentation has been added to doc/code/auxiliary_attacks.

NaijingGuo and others added 30 commits April 29, 2024 10:19
Copy link
Contributor

@rdheekonda rdheekonda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @blakebullwinkel !

@rlundeen2 rlundeen2 merged commit 4e2517f into Azure:main Oct 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT Add adversarial suffix attack GCG
6 participants