Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Integrate XPIATestOrchestrator with the AI Recruiter #684

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

KutalVolkan
Copy link
Contributor

@KutalVolkan KutalVolkan commented Feb 2, 2025

The AI Recruiter is now fully functional with a FastAPI server, allowing us to upload PDFs and compare candidates’ résumés against job descriptions.

The previous raw HTTP approach struggled with parsing, formatting, and multipart uploads, making integration a challenge. I couldn’t get the old feature to work properly, so I did the next best thing—added a new feature instead! 😅 But don’t worry, I kept backward compatibility—no features were harmed in the process!

Now, HTTPTarget fully supports AI Recruiter, enabling seamless automated CV uploads and candidate evaluation.

I also updated the Docker setup to simplify deployment—be sure to run it before testing the ai_recruiter_demo.ipynb. You can find it on GitHub: https://github.com/KutalVolkan/ai_recruiter/tree/main/docker_setup

Next Steps:

  • Ensure full functionality of XPIAOrchstrator (this may require organizing ai_recruiter_demo.ipynb).
  • Code clean up and update docstrings.
  • Convert the notebook into a .py script.
  • Modify the prompt injection technique:
    • Update injection_items and insert relevant skills, education, and qualifications based on the job description.
  • Write tests for the new HTTPTarget features.
  • Write a PyRIT blog post covering the setup, the idea behind it, and the results.
  • Perform integration testing for the AI Recruiter Demo. If it's wished 😄

Related Issue:
#541


More Information about the AI Recruiter:

Copy link
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all so cool @KutalVolkan! I love this scenario!

I left some comments. One additional non-comment, I'd love to have this written out on the blog if you're interested in either writing it or having us writing it!

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the entire example is super cool. We should 100% have it here. Amazing work!

A few thoughts, though:

  • maybe we should have a xpia directory? @rlundeen2 wdyt?
  • This heavily depends on the AI recruiter application. That means running this notebook depends on having that repo's code downloaded and running (and perhaps certain resources set up?). We need instructions for that in the notebook and I'm thinking an integration test. You might have seen other PRs with integration tests recently. If you don't know what I mean please lmk and I'll point you the right way.

@KutalVolkan
Copy link
Contributor Author

Hello @rlundeen2 & @romanlutz ,

When running pre-commit run --all, I encountered a MyPy type-checking error in doc/code/orchestrators/3_xpia_orchestrator.py at line 192. The issue is an incompatible type assignmentHTTPXApiTarget is being assigned to a variable that expects SemanticKernelPluginAzureOpenAIPromptTarget.

I’ve tried troubleshooting, but I haven't been able to resolve it yet. Any suggestions?

Thanks!

@KutalVolkan
Copy link
Contributor Author

  • If you don't know what I mean please lmk and I'll point you the right way.

Hello Roman,

If you don’t mind, could you provide some pointers on where to look? Otherwise, I’ll figure it out myself. :)

@KutalVolkan KutalVolkan marked this pull request as ready for review February 7, 2025 19:08
@romanlutz
Copy link
Contributor

  • If you don't know what I mean please lmk and I'll point you the right way.

Hello Roman,

If you don’t mind, could you provide some pointers on where to look? Otherwise, I’ll figure it out myself. :)

The tests under tests/unit all run locally only. The integration tests are under tests/integration and we run these separately with actual LLM endpoints etc. It would be really cool if we could have an integration test that runs this scenario, but it would require starting this service locally, of course.

FWIW integration tests are brand new here and we're just in the process of adding a bunch of them to cover as much as we can, including notebook examples.

@romanlutz
Copy link
Contributor

Hello @rlundeen2 & @romanlutz ,

When running pre-commit run --all, I encountered a MyPy type-checking error in doc/code/orchestrators/3_xpia_orchestrator.py at line 192. The issue is an incompatible type assignmentHTTPXApiTarget is being assigned to a variable that expects SemanticKernelPluginAzureOpenAIPromptTarget.

I’ve tried troubleshooting, but I haven't been able to resolve it yet. Any suggestions?

Thanks!

My guess is that you're reusing the same variable name for the processing_target. Can we try calling them something specific for each of the examples? semantic_kernel_processing_target and httpx_api_processing_target or something like that.

@KutalVolkan
Copy link
Contributor Author

KutalVolkan commented Feb 8, 2025

  • If you don't know what I mean please lmk and I'll point you the right way.

Hello Roman,
If you don’t mind, could you provide some pointers on where to look? Otherwise, I’ll figure it out myself. :)

The tests under tests/unit all run locally only. The integration tests are under tests/integration and we run these separately with actual LLM endpoints etc. It would be really cool if we could have an integration test that runs this scenario, but it would require starting this service locally, of course.

FWIW integration tests are brand new here and we're just in the process of adding a bunch of them to cover as much as we can, including notebook examples.

Hello Roman,

I uploaded the integration test. It is ready for review. You can go into the path PyRIT\tests\integration\ai_recruiter and run:

pytest .\test_ai_recruiter.py -s

Note: I use OpenAI models and endpoints for the AI recruiter.
Update: Switched to using AzureOpenAI endpoints and deployments.

image

@KutalVolkan KutalVolkan changed the title [DRAFT] FEAT: Integrate XPIATestOrchestrator with the AI Recruiter FEAT: Integrate XPIATestOrchestrator with the AI Recruiter Feb 22, 2025
Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, @KutalVolkan !

IMO this is already in a very good state. I left one comment about how we'll deal with Docker in the integration tests that I want the other maintainers to chime in on, but otherwise I think this is ready to go (assuming the smaller comments are addressed, of course).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the corresponding Python file. We have everything as ipynb AND py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: undo orchestrator changes if you didn't make real changes 🙂

@@ -118,4 +118,135 @@
memory = CentralMemory.get_memory_instance()
memory.dispose_engine()

# %% [markdown]
# RAG Vulnerability Demonstration. In this demonstration, we use the **XPIATestOrchestrator** to generate a CV (overlaying hidden text for “Relevant skills”) and automatically upload it to `/upload/`, then trigger `/search_candidates/` for evaluation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit:

Suggested change
# RAG Vulnerability Demonstration. In this demonstration, we use the **XPIATestOrchestrator** to generate a CV (overlaying hidden text for “Relevant skills”) and automatically upload it to `/upload/`, then trigger `/search_candidates/` for evaluation.
# ## RAG Vulnerability Demonstration.
# In this demonstration, we use the **XPIATestOrchestrator** to generate a CV (overlaying hidden text for “Relevant skills”) and automatically upload it to `/upload/`, then trigger `/search_candidates/` for evaluation.

@@ -323,8 +323,10 @@ async def _serialize_pdf(self, pdf_bytes: bytes, content: str):
Returns:
DataTypeSerializer: The serializer object containing metadata about the saved file.
"""
original_filname_ending = self._existing_pdf_path.stem if self._existing_pdf_path else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
original_filname_ending = self._existing_pdf_path.stem if self._existing_pdf_path else ""
original_filename_ending = self._existing_pdf_path.stem if self._existing_pdf_path else ""

nit typo

logger = logging.getLogger(__name__)


class HTTPXApiTarget(HTTPTarget):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crucial in any way but... API is an acronym and I like capitalizing those. Just like HTTPX. But HTTPXAPITarget is... a choice 😆

"""

# Start containers in the background
subprocess.run(["docker-compose", "up", "-d", "--build"], check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes we have docker-compose locally. That may be a stretch for our integration test pipeline (which already runs in a container from what I understand).

We could

  • skip this one in our pipeline BUT that means we have to run it locally every so often to make sure it doesn't break. NOT IDEAL but maybe a first step (?)
  • deploy this as a service in Azure (Azure Container Service probably?) at setup time and then run it against that. Definitely brings lots of complexity with it... we'd need to make sure it's secure, and also the service principal running our pipeline needs those permissions (creating ACS, for one). A lot of this is on us, of course, since we maintain the infrastructure for integration tests.

I think I'd be happy going with the former and then setting up a task for us to go to the latter, too. I think this has tremendous potential even as a blueprint for a generic service where we replace the contents with other XPIA scenarios.

Looking for input from @rlundeen2 @nina-msft @eugeniavkim @jbolor21 @jsong468 - any thoughts?

from pyrit.prompt_target.http_target.httpx_api_target import HTTPXApiTarget

# Initialize PyRIT to set up the central memory instance.
initialize_pyrit(memory_db_type=DUCK_DB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a fixture for this. We have an existing one called "patch_central_database" that should do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get all these from? 😆

initialize_pyrit(memory_db_type=DUCK_DB)


# Dummy transport to simulate a successful file upload response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's rename from "dummy" to "mock"

Comment on lines +42 to +44
target = HTTPXApiTarget(
http_url="http://example.com/upload/", method="POST", timeout=180, **{"transport": transport}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests should not make network calls. This should be patched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants