Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add AttackConfiguration table, allowing objectives in memory #726

Open
rlundeen2 opened this issue Feb 19, 2025 · 8 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@rlundeen2
Copy link
Contributor

rlundeen2 commented Feb 19, 2025

We need to coalesce details about an attack and tie it to promptRequestPiece. To start, this can be basic, but grow over time.

In the database, we should have a new table named AttackConfiguration. A single orchestrator can have multiple AttackConfigurations, but each "objective" would have a single AttackConfiguration - e.g. in PromptSendingOrchestrator, every prompt sent would have a single AttackConfiguration. The AttackConfiguration table looks like:

  • id
  • orchestrator_identifier (this should move from PromptMemoryEntry to here)
  • conversation_objective
  • labels (this should move from PromptMemoryEntry to here)

PromptMemoryEntry, besides removing labels and removing orchestrator_identifier, should add:

  • attack_configuration_id
  • target_type: Literal["scorer_target", "objective_target", "adversarial_target", "other"]

We likely want to add more information to this table. But maybe postpone this and keep as simple as we can to start.

PromptRequestPiece should put all this together automatically, like how we do with scores today, and include the various pieces (conversation_objective, orchestrator_identifier, labels). In other words, when memoryInterface returns PromptRequestPiece, these should always have populated conversation_objectives. This needs to be present for MVP

Similarly, an AttackConfiguration class can add methods to help find start and end of attack, etc. This doesn't need to be present immediately.

There will likely be nuances as we plumb this through. But there are a lot of advantages. E.g. we can get rid of task for score methods, because we can just use the conversation_objective

This is a hard first issue to tackle. If anyone volunteers, please work closely with the core team. Else, we'll probably tackle it.

ref: #724

@romanlutz romanlutz changed the title FEAT: Add Objectives to converstations FEAT: Add Objectives to conversations Feb 20, 2025
@imranbohoran
Copy link

@rlundeen2 - Thinking of this through another lens, what are your thoughts on having a separate attacks table. That table can have data pertaining to an attack (i.e. an ID, start and end timestamps, objective, a type (single-turn or multi-turn), perhaps a name etc.). When pulling out data for and capturing the data for further analysis, a start and end times for an attack seemed useful to have. We of course have the timestamp for the prompt entries, but an attack that produces multiple prompts, specially a multi-turn like crescendo, can run for a longer period. Every prompt can have a reference back to the attack (ScoreEntry.attack_id -> Attacks.id)
I view the orchestrator as the technique that orchestrates the attack(s). So was thinking if an attack can be modelled separately so all attack instance properties can be held together. Would be interested to find out your thoughts on this.

@rlundeen2
Copy link
Contributor Author

rlundeen2 commented Feb 24, 2025

I like this approach. I'm going to update the ticket with some adjustments. Before anyone tackles this I'd love @romanlutz to take a look and help adjust with feedback

@rlundeen2 rlundeen2 changed the title FEAT: Add Objectives to conversations FEAT: Add AttackConfiguration table, allowing objectives in memory Feb 24, 2025
@romanlutz
Copy link
Contributor

Nice discussion and ideas! I generally agree with all that was written above, including the updated description of the work item by @rlundeen2 . This strikes me as a rather huge item and should probably be handled by us (?)

The TreeOfAttacksWithPruningOrchestrator (short: TAP) used to spin up multiple orchestrators under the hood, but I think you refactored that away @rlundeen2 , right?

@rlundeen2
Copy link
Contributor Author

Correct, TreeOfAttacks is just one orchestrator now

@romanlutz romanlutz added the enhancement New feature or request label Feb 25, 2025
@rlundeen2
Copy link
Contributor Author

rlundeen2 commented Feb 25, 2025

@imranbohoran we're okay with you taking this - honestly it would be a huge contribution! We probably wouldn't get to it in at least six weeks otherwise. It is a tough one to tackle but we can also support any questions here and/or via Discord - and the team is happy with the design above. Do you want us to assign to you?

@romanlutz
Copy link
Contributor

... and you can always start a draft PR where we provide instant feedback 🙂 If you want to tackle it that is.

@imranbohoran
Copy link

Thanks both. Happy to take this on and love the engagement and support on this. Please go ahead and assign it to me. I'll start with some draft PR(s) as suggested and be in close communication with you folks. Will post updates here, so we have a trail of the conversations.

imranbohoran added a commit to Mindgard/PyRIT that referenced this issue Mar 6, 2025
* Following the converstion at Azure#726
  the AttackConfiguraiton concept is introduced and tried out with a
  CrescendoOrchestrator and PromptSendingOrchestrator.
* The PromptEntry has a link to the AttackConfiguration and the SeedPrompt
  carries the AttackConfiguration through the function calls.
* This commit is only serving as a PoC of introducing the AttackConfiguration
  concept and the different touch points in the code base for it to work.
* As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator
  will result in a populated AttackConfiguration and prompt entries linke to the same.
* This commit does not handle any tests and given that certain function arguments have
  been changed.

Co-authored-by: Ayomide Apantaku <[email protected]>
imranbohoran added a commit to Mindgard/PyRIT that referenced this issue Mar 6, 2025
* Following the conversation at Azure#726
  the AttackConfiguration concept is introduced and tried out with a
  CrescendoOrchestrator and PromptSendingOrchestrator.
* The PromptEntry has a link to the AttackConfiguration and the SeedPrompt
  carries the AttackConfiguration through the function calls.
* This commit is only serving as a PoC of introducing the AttackConfiguration
  concept and the different touch points in the code base for it to work.
* As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator
  will result in a populated AttackConfiguration and prompt entries linke to the same.
* This commit does not handle any tests and given that certain function arguments have
  been changed.

Co-authored-by: Ayomide Apantaku <[email protected]>
@imranbohoran
Copy link

imranbohoran commented Mar 7, 2025

Hi @romanlutz/ @rlundeen2. We've created a draft PR to validate/get feedback on the approach we were thinking based on our understanding. There's a write-up in the PR description where we try to capture the highlights. It's a bit of a lengthy PR given that we wanted to demonstrate an end-to-end slice using 2 orchestrators. Would really appreciate your inputs/feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants