-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add AttackConfiguration table, allowing objectives in memory #726
Comments
@rlundeen2 - Thinking of this through another lens, what are your thoughts on having a separate |
I like this approach. I'm going to update the ticket with some adjustments. Before anyone tackles this I'd love @romanlutz to take a look and help adjust with feedback |
Nice discussion and ideas! I generally agree with all that was written above, including the updated description of the work item by @rlundeen2 . This strikes me as a rather huge item and should probably be handled by us (?) The |
Correct, TreeOfAttacks is just one orchestrator now |
@imranbohoran we're okay with you taking this - honestly it would be a huge contribution! We probably wouldn't get to it in at least six weeks otherwise. It is a tough one to tackle but we can also support any questions here and/or via Discord - and the team is happy with the design above. Do you want us to assign to you? |
... and you can always start a draft PR where we provide instant feedback 🙂 If you want to tackle it that is. |
Thanks both. Happy to take this on and love the engagement and support on this. Please go ahead and assign it to me. I'll start with some draft PR(s) as suggested and be in close communication with you folks. Will post updates here, so we have a trail of the conversations. |
* Following the converstion at Azure#726 the AttackConfiguraiton concept is introduced and tried out with a CrescendoOrchestrator and PromptSendingOrchestrator. * The PromptEntry has a link to the AttackConfiguration and the SeedPrompt carries the AttackConfiguration through the function calls. * This commit is only serving as a PoC of introducing the AttackConfiguration concept and the different touch points in the code base for it to work. * As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator will result in a populated AttackConfiguration and prompt entries linke to the same. * This commit does not handle any tests and given that certain function arguments have been changed. Co-authored-by: Ayomide Apantaku <[email protected]>
* Following the conversation at Azure#726 the AttackConfiguration concept is introduced and tried out with a CrescendoOrchestrator and PromptSendingOrchestrator. * The PromptEntry has a link to the AttackConfiguration and the SeedPrompt carries the AttackConfiguration through the function calls. * This commit is only serving as a PoC of introducing the AttackConfiguration concept and the different touch points in the code base for it to work. * As of this commit, using a Crescendo Orchestrator and PromptSendingOrchestrator will result in a populated AttackConfiguration and prompt entries linke to the same. * This commit does not handle any tests and given that certain function arguments have been changed. Co-authored-by: Ayomide Apantaku <[email protected]>
Hi @romanlutz/ @rlundeen2. We've created a draft PR to validate/get feedback on the approach we were thinking based on our understanding. There's a write-up in the PR description where we try to capture the highlights. It's a bit of a lengthy PR given that we wanted to demonstrate an end-to-end slice using 2 orchestrators. Would really appreciate your inputs/feedback. |
We need to coalesce details about an attack and tie it to promptRequestPiece. To start, this can be basic, but grow over time.
In the database, we should have a new table named
AttackConfiguration
. A single orchestrator can have multiple AttackConfigurations, but each "objective" would have a single AttackConfiguration - e.g. in PromptSendingOrchestrator, every prompt sent would have a single AttackConfiguration. TheAttackConfiguration
table looks like:PromptMemoryEntry, besides removing labels and removing orchestrator_identifier, should add:
We likely want to add more information to this table. But maybe postpone this and keep as simple as we can to start.
PromptRequestPiece
should put all this together automatically, like how we do with scores today, and include the various pieces (conversation_objective, orchestrator_identifier, labels). In other words, when memoryInterface returns PromptRequestPiece, these should always have populated conversation_objectives. This needs to be present for MVPSimilarly, an AttackConfiguration class can add methods to help find start and end of attack, etc. This doesn't need to be present immediately.
There will likely be nuances as we plumb this through. But there are a lot of advantages. E.g. we can get rid of
task
for score methods, because we can just use theconversation_objective
This is a hard first issue to tackle. If anyone volunteers, please work closely with the core team. Else, we'll probably tackle it.
ref: #724
The text was updated successfully, but these errors were encountered: