Session Clusters

Session clusters are long-lived Apache Flink® clusters that can be used to execute multiple applications simultaneously or run short-lived, interactive jobs on demand. It is possible to execute Deployments on session clusters by using session mode.

Limitations

Support for session clusters currently has some limitations compared with Deployments:

SSL/TLS: Auto-provisioned SSL/TLS for Flink intra-cluster and external communication is not supported. SSL/TLS has to be configured manually.

Autopilot: Autoscaling is not supported for session clusters and limited to Deployments running in session mode.

Specification

Session clusters are managed via namespaced SessionCluster resources which are configured similarly to Deployments. However, SessionClusters have fewer configurable options than Deployments since this resource only configures the Flink cluster itself and not the applications that will run on it.

Desired State

A SessionCluster resource has a desired state specified at spec.state. The desired state can be either:

  • RUNNING when the cluster should be provisioned and kept running, or
  • STOPPED when the cluster should be torn down, along with all currently running applications

Attention

All Deployments running on a session cluster must be terminated before the session cluster can be stopped.

Changing a Running SessionCluster

Only the desired state and number of TaskManagers of a session cluster may be changed while the cluster is in a non-terminal state non-terminal-state. A SessionCluster is in a “terminal state” when its desired state is STOPPED and there are no in-progress operations on the cluster, such as when the cluster is starting, stopping, or being updated.

Note

Scaling down a running session cluster (by reducing the value of spec.numberOfTaskManagers) can cause applications running on the cluster to restart.

Full Example

The following snippet is a complete example of a SessionCluster, including optional keys.

kind: SessionCluster
apiVersion: v1
metadata:
  name:
  labels:
    env: testing
spec:
  state: RUNNING
  deploymentTargetName: default
  flinkVersion: 1.12
  flinkImageRegistry: registry.ververica.com/v2.4
  flinkImageRepository: flink
  flinkImageTag: 1.12.2-stream1-scala_2.12
  numberOfTaskManagers: 5
  resources:
    jobmanager:
      cpu: 2
      memory: 1g
    taskmanager:
      cpu: 16
      memory: 32g
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: 32
  logging:
    loggingProfile: default
    log4jLoggers:
      "": INFO
      org.apache.flink.streaming.examples: DEBUG
  kubernetes:
    pods:
      envVars:
      - name: KEY
        value: VALUE