Apache Flink® Session Clusters¶
Session clusters are long-lived Apache Flink® clusters that can be used to execute multiple applications simultaneously or run short-lived, interactive jobs on demand.
While session clusters are useful in certain situations, Deployments should still be used for all production applications.
In a future version of Ververica Platform, users will be able to configure Deployments to run in a specified SessionCluster. For now, however, session clusters in Ververica Platform are simply managed, standalone Apache Flink® clusters, and users are responsible for submitting and managing the jobs that run on them.
Support for session clusters in Ververica Platform is a new feature, and currently has some limitations compared with Deployments:
SSL/TLS: Session clusters in Ververica Platform do not currently support auto-provisioned SSL/TLS for Flink intra-cluster and external communication.
Kubernetes HA: Ververica Platform’s built-in Flink Kubernetes HA support does not yet work with session clusters. HA can still be configured per the Apache Flink® documentation by passing the appropriate Flink configuration options in the SessionCluster resource.
Session clusters are managed via namespaced SessionCluster resources which are configured similarly to Deployments. However, SessionClusters have fewer configurable options than Deployments since this resource only configures the Apache Flink® cluster itself and not the applications that will run on it.
Like Deployments, a SessionCluster resource has a desired state specified at
desired state can be either:
- RUNNING when the cluster should be provisioned and kept running, or
- STOPPED when the cluster should be torn down, along with all currently running applications
As soon as a running SessionCluster is given the desired state STOPPED, all jobs running on the cluster will be implicitly terminated without a grace period and cluster resources torn down. It is the user’s responsibility to take care when stopping a session cluster.
Changing a Running SessionCluster¶
Only the desired state and number of taskmanagers of a SessionCluster may be changed while the cluster is in a non-terminal state.
Scaling down a running SessionCluster (by reducing the value of
can cause applications running on the cluster to restart.
Differences from Deployments¶
Any configuration options specific to an Apache Flink® application are omitted from the SessionCluster resource. This includes options such as the location of an application jar artifact, the upgrade strategy configuration, and job parallelism, which are all specific to a single job.
Additionally, unlike Deployments, a SessionCluster’s name rather than its ID is used to identify it when using the Ververica Platform API.
Apart from these differences, SessionClusters generally support the same configuration as Deployments, including the Flink version and image, CPU/memory resource limits, and logging configuration.
The following snippet is a complete example of a SessionCluster, including optional keys.
kind: SessionCluster apiVersion: v1 metadata: name: labels: env: testing spec: state: RUNNING deploymentTargetName: default flinkVersion: 1.12 flinkImageRegistry: registry.ververica.com flinkImageRepository: flink flinkImageTag: 1.12.0-stream1-scala_2.12 numberOfTaskManagers: 5 resources: jobmanager: cpu: 2 memory: 1Gi taskmanager: cpu: 16 memory: 32Gi flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION logging: loggingProfile: default log4jLoggers: "": INFO org.apache.flink.streaming.examples: DEBUG
SessionClusters accept exactly the same Kubernetes Pod template configuration as Deployments (see
Apache Flink® Pod Templates) under the key
A minimal example:
kind: SessionCluster spec: kubernetes: pods: envVars: - name: name value: value