Expose readiness/liveness probes #24

khvn26 · 2025-02-25T11:40:34Z

Currently we expect orchestrators to execute a resource-heavy management command for each check. This might cause resource starvation on smaller clusters. In fact, we need to expose two health checks, preferably as HTTP endpoints:

A simple liveness probe that means the task processor is running.
A readiness probes that makes sure the database connections are healthy.

rolodato · 2025-02-25T12:18:24Z

+1 to separate liveness and readiness checks - see Flagsmith/flagsmith#5151 for an example.

I would also suggest making both checks be served over HTTP to avoid spawning new processes on the pods to run the health checks.

khvn26 · 2025-02-25T12:54:03Z

With task-processor being a Django app installed on top of Core API, task-processor containers are in fact capable of using Core API's readiness and liveness probes introduced in Flagsmith/flagsmith#5151.

The following PRs need to be merged before we close this:

feat: Add HTTP server, remove unhealthy thread monitoring in favour of logging #25 removes the management command, and adds a Gunicorn server to the Task processor entrypoint.
feat: Switch existing task processor health checks to new liveness probe flagsmith#5161 redirects the management command call to the new liveness probe, and makes sure the health check endpoints are exposed for Task processor.

An overall summary of the changes across the two PRs:

The checktaskprocessorthreadhealth is removed completely. When running python manage.py checktaskprocessorthreadhealth, an HTTP request is made to health/liveness instead.
The runprocessor management command runs a Gunicorn server now, and accepts Gunicorn command line arguments in addition to Task processor arguments. I'm hoping we'll adopt this code for the main Core API entrypoint as well, as this will make the application more portable. This also will allow us to serve the Prometheus endpoint directly from the Task processor containers.
When running runprocessor, the following HTTP endpoints are now exposed:

/health/liveness
/health/readiness
/version
/processor/monitoring

The minor changes and improvements include the following:

Task processor documentation refresh — the monitoring docs are updated and the entrypoint reference now includes environment variables.
Task processor Docker entrypoint now accepts TASK_PROCESSOR_GRACE_PERIOD_MS in addition to TASK_PROCESSOR_GRACE_PERIOD.
Docker Compose healthcheck is removed in favour of a Dockerfile HEALTHCHECK directive, which is supported across a variety of orchestrating platforms (kudos to @rolodato for this)
scripts/healtcheck.py, which was used as a Compose healthcheck, is deprecated.

One More Thing is Flagsmith/flagsmith-common#13, which modernises flagsmith-common packaging, introduces typing and integrates APIs now common to Task processor and Core API. The next step after merging it will be porting flagsmith-task-processor over to flagsmith-common, which will decrease the maintenance overhead for our public common dependencies.

khvn26 changed the title ~~Expose an HTTP health check~~ Expose readiness/liveness probes Feb 25, 2025

khvn26 mentioned this issue Feb 26, 2025

feat: Add healthcheck views + urls, typing, ruff linting, src layout Flagsmith/flagsmith-common#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose readiness/liveness probes #24

Expose readiness/liveness probes #24

khvn26 commented Feb 25, 2025 •

edited

Loading

rolodato commented Feb 25, 2025

khvn26 commented Feb 25, 2025 •

edited

Loading

Expose readiness/liveness probes #24

Expose readiness/liveness probes #24

Comments

khvn26 commented Feb 25, 2025 • edited Loading

rolodato commented Feb 25, 2025

khvn26 commented Feb 25, 2025 • edited Loading

khvn26 commented Feb 25, 2025 •

edited

Loading

khvn26 commented Feb 25, 2025 •

edited

Loading