Session clusters are suitable for development and test purposes in non-production environments. You can deploy or debug jobs in a session cluster to improve the resource utilization of a JobManager and accelerate the job startup.
A Session cluster allows multiple jobs to use the same JobManager, which increases the resource utilization of the JobManager. If multiple jobs run on the same JobManager, the stability of jobs is affected. Session clusters do not support the monitoring and alerting feature for a single job. Therefore, session clusters are suitable only when you test jobs.
When you create a session cluster, the cluster resources are consumed regardless of whether you use the session cluster. The resource consumption is based on the configurations that you select when you create the cluster.
Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in production environments. If you use session clusters in production environments, the following stability issues may occur:
- If a JobManager is faulty, all jobs running on the Session Cluster are affected.
- If a TaskManager is faulty, the jobs that have tasks running on the TaskManager are affected.
- If processes are not isolated for tasks that run on the same TaskManager, the tasks may be affected by each other.
If the session cluster uses the default configurations, take note of the following points:
- For a single small job, we recommend that the total number of such jobs in a cluster be no more than 100.
- For complex jobs, we recommend that the number of parallel jobs be no more than 512, and the number of clusters in which 64 medium-sized jobs run in parallel be no more than 32. Otherwise, issues such as heartbeat timeout may occur and the stability of the cluster may be affected. In this case, you must increase the heartbeat interval and heartbeat timeout period.
- If you want to run more tasks at the same time, you must increase the resource configuration of the session cluster.
Create a session cluster
- Log in to Ververica Cloud.
- On the Dashboard page, locate the workspace you want to manage, click the title of the workspace or this icon ⋮, and select Open Console.
- In the left-side navigation pane of the Console, click Session Clusters and click on Create Session Cluster.
Configure the parameters. See Parameter descriptions below.
Click Create Session Cluster.
Once configured and created, using the parameter descriptions below, you can Start the cluster.
|Name||The name of the Session Cluster.|
|State||The desired state of the cluster. Valid values: STOPPED: The cluster is stopped after it is configured, and the jobs in the cluster are also stopped. RUNNING: The cluster keeps running after it is configured.|
|Label key||You can configure labels for jobs in the Labels section. This allows you to find a job on the Overview page in an efficient manner.|
|Engine Version||The version of the Flink engine that is used by the Session Cluster. All jobs running on the Session Cluster will use this version.|
|Flink Restart Policy||Valid values: Failure Rate: the failure rate. If you select this option, you must also configure Failure Rate Interval, Max Failures per Interval, and Delay between Restart Attempts. Fixed Delay: Jobs are restarted with a delay. The delay period is fixed. If you select this option, you must also configure Number of Restart Attempts and Delay between Restart Attempts. No Restarts: No jobs are restarted. Important! If you leave this parameter empty, the default Apache Flink restart policy is used. In this case, if a task fails and checkpointing is disabled, the JobManager is not restarted. If you enable checkpointing, the JobManager is restarted.|
|Additional Flink Configuration||Configure other Flink settings, such as taskmanager.numberOfTaskSlots: 1.|
Important: Pay attention to the notes and cautions below the table.
|Number of TaskManagers||Specifies the number of TaskManagers in your Flink cluster.||1|
|JobManager CPU Cores||Specifies the number of CPU cores allocated to the JobManager.||1|
|JobManager Memory||Specifies the memory allocated to the JobManager. The minimum value is 1 GiB. We recommend using GiB or MiB as the unit (e.g., 1024 MiB or 1.5 GiB).||-|
|TaskManager CPU Cores||Specifies the number of CPU cores allocated to each TaskManager.||2|
|TaskManager Memory||Specifies the memory allocated to each TaskManager. The minimum value is 1 GiB. We recommend using GiB or MiB as the unit (e.g., 1024 MiB or 1.5 GiB).||-|
We recommend that you configure JobManager resources and heartbeat-related parameters for the JobManager. When you configure the JobManager, take note of the following points:
- The JobManager provides features, such as TaskManager heartbeat, task serialization, and resource scheduling. Therefore, we recommend that the resource configuration for the JobManager be no less than the default configuration. Adjust this based on the workload of your cluster.
- To ensure cluster stability, you must prevent heartbeat timeout caused by the busy main thread of the JobManager. Therefore, we recommend that you set the heartbeat interval to at least 10 seconds and the heartbeat timeout period to at least 50 seconds. They can be specified in Additional Flink Configuration. The heartbeat interval is specified by the
heartbeat.intervalparameter and the heartbeat timeout period is specified by the
heartbeat.timeoutparameter. You can increase the values of these parameters based on the increase in the number of TaskManagers and jobs.
We recommend that you specify the number of slots for each TaskManager and the amount of resources that are available for TaskManagers. The number of slots is specified by the
taskmanager.numberOfTaskSlots parameter. When you configure this parameter, take note of the following points:
- For a single small job, we recommend that you set the CPU-to-memory ratio of a single slot to 1:4 and configure at least 1 CPU core and 2 GiB of memory for each slot.
- For a complex job, we recommend that you configure at least 1 CPU core and 4 GiB of memory for each slot. If you use the default resource configuration, you can configure two slots for each TaskManager.
- We recommend that you use the default resource configuration for each TaskManager and set the number of slots to 2.
If the resources configured for a TaskManager are insufficient, the stability of the jobs that run on the TaskManager is affected. Moreover, the number of slots will also be too small to evenly spread the load on TaskManager. As a result, resource utilization is reduced.
However, if you configure a lot of resources for a TaskManager, there will be a large number of jobs running on it. If the TaskManager is faulty, then all the jobs are affected.
|Root Log Level||Valid values, ordered from low level to high level: TRACE, DEBUG, INFO, WARN, and ERROR.|
|Log Levels||The name and level of the log.|
|Logging Profile||The log template. You can use the default template or configure a custom template.|
Start a session cluster
To start a session cluster after it is created:
Click Start in the Actions column on the Session Clusters page.
After the session cluster enters the Running state, you can select the session cluster In Deployment’s Configuration:
You can also use the session cluster to Debug your SQL scripts:
Delete a session cluster
If a session cluster is no longer used, you can delete it by following the steps below.
- On the Session Clusters page, stop the required Session Cluster.
- When the status reaches STOPPED, click Delete in the Actions column.