Kubernetes Probes
Kubernetes probes support cluster monitoring:
- Readiness probes are used by Kubernetes to decide when containers are ready to start accepting traffic, and to signal Pods as ready.
- Liveness probes are used to decide when containers need to be restarted, for example they can catch deadlock conditions where an application is still running but one or more containers are blocked.
As part of the standard Deployment configuration, Ververica Platform
configures default health endpoints
for the appmanager
and
gateway
containers to enable Kubernetes readinessProbe
and
livenessProbe
functionality to monitor the behaviour of a running
Flink application.
You can check appmanager
and gateway
container health by running a
simple curl
command from inside the container, see Access the
Health Endpoint below.
Defaults
The default values are specified as follows, based on the Deployment configuration template:
livenessProbe:
httpGet:
path: /actuator/health
port: management
initialDelaySeconds: 90
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health
port: management
initialDelaySeconds: 10
timeoutSeconds: 10
Configuration
To configure alternative endpoints or change the delay and timeout
values, add an appropriate configuration fragment to your application
main values
configuration file, by default values.yaml
, as
specified on the command line when you install/upgrade Ververica
Platform:
helm upgrade --install --values values.yaml
The configurable values are the following:
MANAGEMENT_ENDPOINTS_WEB_BASE_PATH
Base path for building the container endpoints.MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH
Container health probe endpoints appended to the base path.initialDelaySeconds
Wait time from starting the container before first probe withreadinessProbe
andlivenessProbe
, i.e. specifies the time the container has before Kubernetes starts to probe. After this time, probing will start.timeoutSeconds
Wait time for a response from the container toreadinessProbe
andlivenessProbe
, i.e. specifies how quickly the container needs to respond to the probe. If the container fails to respond in time, the failure is counted. When failures exceed the failure threshold for the probe, the probe failure behaviour is triggered.
Be aware of the possible impact of changing the probe timings if
you change the values for initialDelaySeconds
and
timeoutSeconds
.
Liveness probes must be configured carefully to ensure that they truly indicate unrecoverable application failure. Also, misconfiguring liveness probes can lead to cascading failures.
initialDelaySeconds
must give the container sufficient time to complete its start up tasks e.g. start all the necessary processes.timeoutSeconds
must give the container sufficient time to respond in all conditions short of an unrecoverable deadlock or failure.
Example configuration
To configure the settings for the appmanager
container, update the
configuration values under the appmanager
root property in the
values
Helm file. In the example, the default endpoint is changed
from /actuator/health
to /appmanager/health
, probe timings are
unchanged:
appmanager:
env:
- name: "MANAGEMENT_ENDPOINTS_WEB_BASE_PATH"
value: "/appmanager"
- name: "MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH"
value: "/health"
livenessProbe:
httpGet:
path: /appmanager/health # default is /actuator/health
port: management
initialDelaySeconds: 90
readinessProbe:
httpGet:
path: /appmanager/health # default is /actuator/health
port: management
initialDelaySeconds: 10
To configure the settings for the gateway
container, update the
configuration values under the gateway
root property in the
values
Helm file. In the example, the default endpoint is changed
from /actuator/health
to /gateway/health
, probe timings are
unchanged:
gateway:
env:
- name: "MANAGEMENT_ENDPOINTS_WEB_BASE_PATH"
value: "/gateway"
- name: "MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH"
value: "/health"
livenessProbe:
httpGet:
path: /gateway/health # default is /actuator/health
port: management
initialDelaySeconds: 90
readinessProbe:
httpGet:
path: /gateway/health # default is /actuator/health
port: management
initialDelaySeconds: 10
To verify the configuration, after installation run the commands to access the health endpoint and verify the output.
Access the Health Endpoint
To access the health endpoint, run the following commands:
- From the bash prompt in a terminal, log into the container using
kubectl
. For example, log into thegateway
container with the following command:
kubectl exec -it pod-name -c gateway
- Execute the following
curl
command to check the health of the gateway container, this example assumes the endpoint was reconfigured from the default to/gateway/health
:
curl http://localhost:management-port/gateway/health
The output should be similar to the terminal output below,
showing an UP
/ DOWN
status with relevant details:
{"status":"UP","components":{"db":{"status":"UP","details":{"database":"SQLite","validationQuery":"isValid()"}},"discoveryComposite":{"description":"Discovery Client not initialized","status":"UNKNOWN","components":{"discoveryClient":{"description":"Discovery Client not initialized","status":"UNKNOWN"}}},"diskSpace":{"status":"UP","details":{"total":101203873792,"free":78737813504,"threshold":10485760,"exists"}}}}