Kubernetes Probes

Kubernetes probes support cluster monitoring:

Readiness probes are used by Kubernetes to decide when containers are ready to start accepting traffic, and to signal Pods as ready.
Liveness probes are used to decide when containers need to be restarted, for example they can catch deadlock conditions where an application is still running but one or more containers are blocked.

As part of the standard Deployment configuration, Ververica Platform configures default health endpoints for the appmanager and gateway containers to enable Kubernetes readinessProbe and livenessProbe functionality to monitor the behaviour of a running Flink application.

You can check appmanager and gateway container health by running a simple curl command from inside the container, see Access the Health Endpoint below.

Defaults

The default values are specified as follows, based on the Deployment configuration template:

livenessProbe:
  httpGet:
    path: /actuator/health
    port: management
    initialDelaySeconds: 90
    timeoutSeconds: 10
readinessProbe:
  httpGet:
    path: /actuator/health
    port: management
    initialDelaySeconds: 10
    timeoutSeconds: 10

Configuration

To configure alternative endpoints or change the delay and timeout values, add an appropriate configuration fragment to your application main values configuration file, by default values.yaml, as specified on the command line when you install/upgrade Ververica Platform:

helm upgrade --install --values values.yaml

The configurable values are the following:

MANAGEMENT_ENDPOINTS_WEB_BASE_PATH Base path for building the container endpoints.
MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH Container health probe endpoints appended to the base path.
initialDelaySeconds Wait time from starting the container before first probe with readinessProbe and livenessProbe, i.e. specifies the time the container has before Kubernetes starts to probe. After this time, probing will start.
timeoutSeconds Wait time for a response from the container to readinessProbe and livenessProbe, i.e. specifies how quickly the container needs to respond to the probe. If the container fails to respond in time, the failure is counted. When failures exceed the failure threshold for the probe, the probe failure behaviour is triggered.

note

Be aware of the possible impact of changing the probe timings if you change the values for initialDelaySeconds and timeoutSeconds.

Liveness probes must be configured carefully to ensure that they truly indicate unrecoverable application failure. Also, misconfiguring liveness probes can lead to cascading failures.

initialDelaySeconds must give the container sufficient time to complete its start up tasks e.g. start all the necessary processes.
timeoutSeconds must give the container sufficient time to respond in all conditions short of an unrecoverable deadlock or failure.

Example configuration

To configure the settings for the appmanager container, update the configuration values under the appmanager root property in the values Helm file. In the example, the default endpoint is changed from /actuator/health to /appmanager/health, probe timings are unchanged:

appmanager:
  env:
    - name: "MANAGEMENT_ENDPOINTS_WEB_BASE_PATH"
      value: "/appmanager"
    - name: "MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH"
      value: "/health"
  livenessProbe:
    httpGet:
      path: /appmanager/health # default is /actuator/health
      port: management
    initialDelaySeconds: 90
  readinessProbe:
    httpGet:
      path: /appmanager/health # default is /actuator/health
      port: management
    initialDelaySeconds: 10

To configure the settings for the gateway container, update the configuration values under the gateway root property in the values Helm file. In the example, the default endpoint is changed from /actuator/health to /gateway/health, probe timings are unchanged:

gateway:
  env:
    - name: "MANAGEMENT_ENDPOINTS_WEB_BASE_PATH"
      value: "/gateway"
    - name: "MANAGEMENT_ENDPOINTS_WEB_PATH_MAPPING_HEALTH"
      value: "/health"
  livenessProbe:
    httpGet:
      path: /gateway/health # default is /actuator/health
      port: management
    initialDelaySeconds: 90
  readinessProbe:
    httpGet:
      path: /gateway/health # default is /actuator/health
      port: management
    initialDelaySeconds: 10

To verify the configuration, after installation run the commands to access the health endpoint and verify the output.

Access the Health Endpoint

To access the health endpoint, run the following commands:

From the bash prompt in a terminal, log into the container using kubectl. For example, log into the gateway container with the following command:

kubectl exec -it pod-name -c gateway

Execute the following curl command to check the health of the gateway container, this example assumes the endpoint was reconfigured from the default to /gateway/health:

curl http://localhost:management-port/gateway/health

The output should be similar to the terminal output below, showing an UP / DOWN status with relevant details:

{"status":"UP","components":{"db":{"status":"UP","details":{"database":"SQLite","validationQuery":"isValid()"}},"discoveryComposite":{"description":"Discovery Client not initialized","status":"UNKNOWN","components":{"discoveryClient":{"description":"Discovery Client not initialized","status":"UNKNOWN"}}},"diskSpace":{"status":"UP","details":{"total":101203873792,"free":78737813504,"threshold":10485760,"exists"}}}}

Defaults​

Configuration​

Example configuration​

Access the Health Endpoint​

Defaults

Configuration

Example configuration

Access the Health Endpoint