FEAT: Add AttackConfiguration table, allowing objectives in memory #726

rlundeen2 · 2025-02-19T17:49:41Z

We need to coalesce details about an attack and tie it to promptRequestPiece. To start, this can be basic, but grow over time.

In the database, we should have a new table named AttackConfiguration. A single orchestrator can have multiple AttackConfigurations, but each "objective" would have a single AttackConfiguration - e.g. in PromptSendingOrchestrator, every prompt sent would have a single AttackConfiguration. The AttackConfiguration table looks like:

id
orchestrator_identifier (this should move from PromptMemoryEntry to here)
conversation_objective
labels (this should move from PromptMemoryEntry to here)

PromptMemoryEntry, besides removing labels and removing orchestrator_identifier, should add:

attack_configuration_id
target_type: Literal["scorer_target", "objective_target", "adversarial_target", "other"]

We likely want to add more information to this table. But maybe postpone this and keep as simple as we can to start.

PromptRequestPiece should put all this together automatically, like how we do with scores today, and include the various pieces (conversation_objective, orchestrator_identifier, labels). In other words, when memoryInterface returns PromptRequestPiece, these should always have populated conversation_objectives. This needs to be present for MVP

Similarly, an AttackConfiguration class can add methods to help find start and end of attack, etc. This doesn't need to be present immediately.

There will likely be nuances as we plumb this through. But there are a lot of advantages. E.g. we can get rid of task for score methods, because we can just use the conversation_objective

This is a hard first issue to tackle. If anyone volunteers, please work closely with the core team. Else, we'll probably tackle it.

ref: #724

The text was updated successfully, but these errors were encountered:

imranbohoran · 2025-02-23T21:31:04Z

@rlundeen2 - Thinking of this through another lens, what are your thoughts on having a separate attacks table. That table can have data pertaining to an attack (i.e. an ID, start and end timestamps, objective, a type (single-turn or multi-turn), perhaps a name etc.). When pulling out data for and capturing the data for further analysis, a start and end times for an attack seemed useful to have. We of course have the timestamp for the prompt entries, but an attack that produces multiple prompts, specially a multi-turn like crescendo, can run for a longer period. Every prompt can have a reference back to the attack (ScoreEntry.attack_id -> Attacks.id)
I view the orchestrator as the technique that orchestrates the attack(s). So was thinking if an attack can be modelled separately so all attack instance properties can be held together. Would be interested to find out your thoughts on this.

rlundeen2 · 2025-02-24T02:24:04Z

I like this approach. I'm going to update the ticket with some adjustments. Before anyone tackles this I'd love @romanlutz to take a look and help adjust with feedback

romanlutz · 2025-02-24T23:47:46Z

Nice discussion and ideas! I generally agree with all that was written above, including the updated description of the work item by @rlundeen2 . This strikes me as a rather huge item and should probably be handled by us (?)

The TreeOfAttacksWithPruningOrchestrator (short: TAP) used to spin up multiple orchestrators under the hood, but I think you refactored that away @rlundeen2 , right?

rlundeen2 · 2025-02-25T00:15:33Z

Correct, TreeOfAttacks is just one orchestrator now

rlundeen2 · 2025-02-25T17:46:18Z

@imranbohoran we're okay with you taking this - honestly it would be a huge contribution! We probably wouldn't get to it in at least six weeks otherwise. It is a tough one to tackle but we can also support any questions here and/or via Discord - and the team is happy with the design above. Do you want us to assign to you?

romanlutz · 2025-02-26T04:25:46Z

... and you can always start a draft PR where we provide instant feedback 🙂 If you want to tackle it that is.

imranbohoran · 2025-02-26T09:44:06Z

Thanks both. Happy to take this on and love the engagement and support on this. Please go ahead and assign it to me. I'll start with some draft PR(s) as suggested and be in close communication with you folks. Will post updates here, so we have a trail of the conversations.

* Following the converstion at Azure#726 the AttackConfiguraiton concept is introduced and tried out with a CrescendoOrchestrator and PromptSendingOrchestrator. * The PromptEntry has a link to the AttackConfiguration and the SeedPrompt carries the AttackConfiguration through the function calls. * This commit is only serving as a PoC of introducing the AttackConfiguration concept and the different touch points in the code base for it to work. * As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator will result in a populated AttackConfiguration and prompt entries linke to the same. * This commit does not handle any tests and given that certain function arguments have been changed. Co-authored-by: Ayomide Apantaku <[email protected]>

* Following the conversation at Azure#726 the AttackConfiguration concept is introduced and tried out with a CrescendoOrchestrator and PromptSendingOrchestrator. * The PromptEntry has a link to the AttackConfiguration and the SeedPrompt carries the AttackConfiguration through the function calls. * This commit is only serving as a PoC of introducing the AttackConfiguration concept and the different touch points in the code base for it to work. * As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator will result in a populated AttackConfiguration and prompt entries linke to the same. * This commit does not handle any tests and given that certain function arguments have been changed. Co-authored-by: Ayomide Apantaku <[email protected]>

imranbohoran · 2025-03-07T09:44:02Z

Hi @romanlutz/ @rlundeen2. We've created a draft PR to validate/get feedback on the approach we were thinking based on our understanding. There's a write-up in the PR description where we try to capture the highlights. It's a bit of a lengthy PR given that we wanted to demonstrate an end-to-end slice using 2 orchestrators. Would really appreciate your inputs/feedback.

rlundeen2 mentioned this issue Feb 19, 2025

FEAT: Improve prompts reference to objectives for offline results analysis #724

Closed

romanlutz changed the title ~~FEAT: Add Objectives to converstations~~ FEAT: Add Objectives to conversations Feb 20, 2025

rlundeen2 changed the title ~~FEAT: Add Objectives to conversations~~ FEAT: Add AttackConfiguration table, allowing objectives in memory Feb 24, 2025

romanlutz added the enhancement New feature or request label Feb 25, 2025

romanlutz assigned imranbohoran Feb 27, 2025

imranbohoran mentioned this issue Mar 7, 2025

[Proof of Concept] Introducing AttackConfigurations #766

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add AttackConfiguration table, allowing objectives in memory #726

FEAT: Add AttackConfiguration table, allowing objectives in memory #726

rlundeen2 commented Feb 19, 2025 •

edited

Loading

imranbohoran commented Feb 23, 2025

rlundeen2 commented Feb 24, 2025 •

edited

Loading

romanlutz commented Feb 24, 2025

rlundeen2 commented Feb 25, 2025

rlundeen2 commented Feb 25, 2025 •

edited

Loading

romanlutz commented Feb 26, 2025

imranbohoran commented Feb 26, 2025

imranbohoran commented Mar 7, 2025 •

edited

Loading

FEAT: Add AttackConfiguration table, allowing objectives in memory #726

FEAT: Add AttackConfiguration table, allowing objectives in memory #726

Comments

rlundeen2 commented Feb 19, 2025 • edited Loading

imranbohoran commented Feb 23, 2025

rlundeen2 commented Feb 24, 2025 • edited Loading

romanlutz commented Feb 24, 2025

rlundeen2 commented Feb 25, 2025

rlundeen2 commented Feb 25, 2025 • edited Loading

romanlutz commented Feb 26, 2025

imranbohoran commented Feb 26, 2025

imranbohoran commented Mar 7, 2025 • edited Loading

rlundeen2 commented Feb 19, 2025 •

edited

Loading

rlundeen2 commented Feb 24, 2025 •

edited

Loading

rlundeen2 commented Feb 25, 2025 •

edited

Loading

imranbohoran commented Mar 7, 2025 •

edited

Loading