Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose readiness/liveness probes #24

Open
khvn26 opened this issue Feb 25, 2025 · 2 comments
Open

Expose readiness/liveness probes #24

khvn26 opened this issue Feb 25, 2025 · 2 comments

Comments

@khvn26
Copy link
Member

khvn26 commented Feb 25, 2025

Currently we expect orchestrators to execute a resource-heavy management command for each check. This might cause resource starvation on smaller clusters. In fact, we need to expose two health checks, preferably as HTTP endpoints:

  • A simple liveness probe that means the task processor is running.
  • A readiness probes that makes sure the database connections are healthy.
@khvn26 khvn26 changed the title Expose an HTTP health check Expose readiness/liveness probes Feb 25, 2025
@rolodato
Copy link
Member

+1 to separate liveness and readiness checks - see Flagsmith/flagsmith#5151 for an example.

I would also suggest making both checks be served over HTTP to avoid spawning new processes on the pods to run the health checks.

@khvn26
Copy link
Member Author

khvn26 commented Feb 25, 2025

With task-processor being a Django app installed on top of Core API, task-processor containers are in fact capable of using Core API's readiness and liveness probes introduced in Flagsmith/flagsmith#5151.

The following PRs need to be merged before we close this:

An overall summary of the changes across the two PRs:

  1. The checktaskprocessorthreadhealth is removed completely. When running python manage.py checktaskprocessorthreadhealth, an HTTP request is made to health/liveness instead.
  2. The runprocessor management command runs a Gunicorn server now, and accepts Gunicorn command line arguments in addition to Task processor arguments. I'm hoping we'll adopt this code for the main Core API entrypoint as well, as this will make the application more portable. This also will allow us to serve the Prometheus endpoint directly from the Task processor containers.
  3. When running runprocessor, the following HTTP endpoints are now exposed:
  • /health/liveness
  • /health/readiness
  • /version
  • /processor/monitoring

The minor changes and improvements include the following:

  • Task processor documentation refresh — the monitoring docs are updated and the entrypoint reference now includes environment variables.
  • Task processor Docker entrypoint now accepts TASK_PROCESSOR_GRACE_PERIOD_MS in addition to TASK_PROCESSOR_GRACE_PERIOD.
  • Docker Compose healthcheck is removed in favour of a Dockerfile HEALTHCHECK directive, which is supported across a variety of orchestrating platforms (kudos to @rolodato for this)
  • scripts/healtcheck.py, which was used as a Compose healthcheck, is deprecated.

One More Thing is Flagsmith/flagsmith-common#13, which modernises flagsmith-common packaging, introduces typing and integrates APIs now common to Task processor and Core API. The next step after merging it will be porting flagsmith-task-processor over to flagsmith-common, which will decrease the maintenance overhead for our public common dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants