Add K8s liveness/readiness probes for server and ClickHouse #35
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Neither the server Deployment nor the ClickHouse StatefulSet have liveness or readiness probes configured. If either process hangs or becomes unresponsive, K8s won't detect it or auto-restart the pod.
Solution
Server (
deploy/server.yaml)GET /actuator/health(Spring Boot Actuator, likely already on classpath) or implementGET /api/v1/health(issue #30)ClickHouse (
deploy/clickhouse.yaml)GET /pingon port 8123 (built-in ClickHouse health endpoint, returns "Ok.\n")Impact
Auto-recovery from hangs/crashes without manual intervention.
Note: ClickHouse is now exposed externally via NodePort (30123 HTTP, 30900 native). Liveness/readiness probes are even more important now that the server depends on ClickHouse being ready before it can initialize its schema. Without a readiness probe on ClickHouse, the server may start before ClickHouse is ready, causing schema init to fail.
Implemented: Added liveness and readiness probes to both deployments.
httpGet /api/v1/health:8081(liveness: 30s initial delay, readiness: 10s)httpGet /ping:8123(liveness: 15s initial delay, readiness: 5s)