Tune Performance
On this page
Effective resource management is crucial for maintaining the performance and efficiency of deployments. Autopilot and Scheduled Tuning provide automated solutions for optimizing resource allocation and adapting to workload demands. This guide explains how to configure and leverage these features for optimal deployment performance.
Overview
Deployment tuning typically requires a significant time investment. For example:
- When you publish a draft, you must configure resources, parallelism, and the number and size of TaskManagers for the draft.
- When a deployment is running, you must adjust the resources of the deployment to maximize resource utilization.
- If backpressure occurs on the deployment or the latency increases, you must adjust the configurations of the deployment.
Tuning Modes
Ververica recommends that you choose a tuning mode that best meets your business requirements. The three available tuning modes are:
- Default mode allows manual tuning based on system generated recommendations of the running deployment.
- Autopilot dynamically adjusts resources based on workload demands, ensuring efficient utilization without manual intervention. It continuously monitors performance metrics and scales resources accordingly.
- Scheduled Tuning allows for predefined optimizations at specific intervals. This helps manage predictable workload patterns and ensures deployments maintain peak efficiency.
The following table describes each of the available tuning modes, the scenarios that are best suited for each one, the benefits of each mode, and recommended resources for further learning.
About Autopilot Mode
Autopilot optimizes deployment performance using two distinct strategies:
- Stable Strategy: Maintains a steady configuration once optimal settings are achieved, preventing unnecessary adjustments.
- Adaptive Strategy: Continuously adjusts parameters in response to system demands and performance fluctuations.
The Stable Strategy ensures that once a deployment reaches a steady state, Autopilot stops making adjustments under the following conditions:
- No adjustments have been made for 24 consecutive hours.
- The system has been running for 72 hours in Stable Strategy, regardless of adjustments. Once either condition is met, Autopilot ceases parameter modifications. However, restarting the deployment resets all Stable Strategy statuses, and the 24-hour and 72-hour conditions begin recalculating from scratch.
Changes to Stable Strategy parameters are saved but do not reset these conditions. Only a deployment restart resets the Stable Strategy logic.
Unlike the Stable Strategy, the Adaptive Strategy continuously monitors system behavior and resource usage, making real-time adjustments. This strategy is ideal for deployments with dynamic workloads requiring constant optimization.
Limits and Considerations
Review these constraints with using Autopilot.
- Unaligned Checkpoints: You cannot modify the parallelism for a deployment if you enable the Unaligned Checkpoints feature.
- Session Clusters: Autopilot is not supported.
- Performance Bottlenecks: Autopilot cannot resolve all bottlenecks, as performance is influenced by upstream and downstream systems. It works best when:
- Traffic changes smoothly.
- No data skew exists.
- Throughput scales linearly with increased parallelism. If these conditions are not met, issues may arise, such as:
- Parallelism changes failing or deployments repeatedly restarting.
- Performance degradation in UDSFs, UDAFs, or UDTFs.
- Increased parallelism overloading external systems, leading to failures.
Review these key considerations when using Autopilot.
- Deployment Restarts: Autopilot restarts deployments when triggered, temporarily pausing data processing.
- Trigger Interval: Autopilot triggers every 10 minutes by default, configurable via the cooldown.minutes parameter.
- Manual Parallelism Configuration: If a DataStream deployment or custom SQL connector explicitly sets parallelism, Autopilot is disabled.
- Policy Timing: A new Autopilot policy cannot be triggered within 30 minutes of an existing policy.
Default Tuning Actions
When enabled, Autopilot automatically adjusts resource configurations based on system metrics.
Parallelism Adjustments
Autopilot optimizes deployment throughput by dynamically adjusting parallelism based on system performance.
- No change needed: If deployment delay remains below 60s, parallelism stays the same.
- Scaling up: If deployment delay exceeds 60s and continues increasing for 3 minutes, parallelism is increased up to twice the current processing capacity (capped at 64 CUs).
- Scaling down: If CPU utilization or vertex node processing time remains below 20% for 24 consecutive hours, parallelism is reduced to optimize resource efficiency.
- Other conditions:
- If vertex node processing time exceeds 80% for 6 minutes, parallelism is increased to lower slot utilization to 50%.
- If average CPU utilization of all TaskManagers exceeds 80% for 6 minutes, parallelism is increased to bring CPU usage down to 50%.
Memory Optimization
Autopilot monitors memory usage and adjusts configurations to prevent failures.
- Scaling up:
- If the JobManager experiences frequent garbage collection (GC) or out-of-memory (OOM) errors, memory is increased (up to 16 GiB).
- If a TaskManager experiences GC, OOM, or HeartBeatTimeout errors, memory is increased (up to 16 GiB).
- If TaskManager memory usage exceeds 95%, memory allocation is increased.
- Scaling down:
- If TaskManager memory usage falls below 30% for 24 hours, memory allocation is reduced (minimum 1.6 GiB).
Run a Deployment Using Autopilot
You can enable and configure Autopilot when starting a job or from the Deployments > Resources tab.
- In the left navigation, click the Namespace selector and select the namespace that you want to open.
- Click Deployments.
- On the Deployments page, click the name of the desired deployment.
- Choose one of the following methods for enabling Autopilot.
- To enable Autopilot on an existing deployment:
- Open the Resources tab.
- Click Autopilot Mode and toggle Autopilot to ON.
- Click Edit in the Configurations section.
- To enable Autopilot when starting a job:
- Click Start at the top right of the Deployments window.
- Select the job start mode (Initial Mode or Resume Mode). For details, see Starting Jobs.
- Toggle Configure Autopilot to ON.
- Set the Resource Tuning Mode to Autopilot Mode.
- To enable Autopilot on an existing deployment:
- Select a resource tuning strategy:
- Stable Strategy: The system will reduce the impact of start-stop behaviours on jobs, and will reduce job resources according to the operation of longer-cycle jobs to reach the convergence state as quickly as possible.
- Adaptive Strategy: The system will pay more attention to the latency of the current job, and the application of resources, and optimize the resources more quickly according to the changes of the relevant indicators.
- Edit the parameters. See Autopilot Parameters (#autopilot-parameters).
- Click Save.
Autopilot Parameters
About Scheduled Mode
Scheduled Mode is a good choice when you know your peak traffic patterns in advance such as traffic for big events, like Black Friday, or when you have high and low traffic periods during a given day. For example, you run a taxi ride service company and a football game happens every Sunday. Or, you run a live broadcasting service and every evening your traffic peak has big spikes during a popular show. You can create a scheduled plan to handle these traffic peaks.
Scheduled Mode also covers scenarios that Autopilot does not cover. For example:
- If you are using Autopilot and the traffic jitters frequently, it will eventually cause the job to be called continuously and restarted continuously.
- When traffic changes slowly, Autopilot does not detect these changes and causes situations where the tuning cannot be completed in one attempt. In this scenario, you will need to make many iterations to achieve better results.
In these scenarios, you can use Scheduled mode to set the optimal resource requirements for the job based on the business characteristics. When traffic jitters are frequent, the job will not be restarted. When you know the required resource allocation during peak or through traffic, you can adjust the job to a better state.
Run a Deployment Using Scheduled Mode
Instead of using Autopilot Mode, you can set up your own scheduled plans to run deployments at specific times, with specific tuning parameters. To schedule a deployment, at least one scheduled plan must exist.
Create a Plan
You can create a scheduled plan and then apply it to a job. The plan is then available to select when you start a deployment.
Once you have scheduled one or more plans, they apply to all the running jobs under that deployment.
- In the left navigation, click Deployments > Resources.
- Click Scheduled Mode.
- In the Resource Plans section, click New Plan.
Enter a Plan Name and configure the parameters:
- Trigger Period: Valid values: No Repeat, Every Day, Every Week, and Every Month. If you set this parameter to Every Week or Every Month, you must specify the related time range during which you want the policy to take effect.
- Trigger Time: The time when the plan takes effect.
- For other parameter descriptions, see Resources and Parameters.
- (Optional) Scroll down and click New Resource Setting Period below the Resource Setting panel and create another set of parameters to control the schedule (e.g. another time period).
- Click OK.
The plan will be saved and listed in the Resource Plans section.
Start a Job Using a Scheduled Plan
- Click the deployment you want to start in the Deployments window.
- Click Start in the Deployments toolbar.
There must be at least one saved plan available to apply at startup.
- In the Start Job dialog, set the mode (Initial or Resume).
- Specify a start time, if appropriate.
- Click to set Configure Autopilot to ON.
- Set the Resource Tuning Mode to Scheduled Mode.
- Select a scheduled plan from the drop-down menu.
If no scheduled plans are available, or if you want to create another one, you can choose Create new scheduled plan, but this will take you back to the main Resources window and you will need to follow the instructions in Create a scheduled plan.
- Click Start. The job will start with the specified scheduled plan.
Change the Applied Plan
You can change the scheduled plan that applies to the currently running job.
This might cause the job to restart.
- In the Deployments > Resources tab, locate the entry for the scheduled plan that is applied to the running job.
- Click Stop Applying: - You can click Stop Applying next to the scheduled plan entry in the Resource Plans list. - You can click the main Stop Applying button near the top of the Resources tab.This is useful if you have many scheduled plans defined and can't easily see the one you want in the Resource Plans list.
- Click Apply next to the new scheduled plan.
Edit an Existing Plan
- You cannot edit the name of an existing plan. You would need to delete the plan and recreate it with a new name, or just create a new plan.
- You cannot edit the details of a plan that is currently applied to a running job
To edit the details of an existing scheduled plan: 1. Display the Deployments > Resources tab. 1. Click on the name of the scheduled plan, or on Details, in the Resource Plans table. 1. Click Edit at the top of the resulting plan screen. 1. Change the parameters. 1. Click Save.
Delete a Plan
You cannot delete a plan that is currently applied to a running job.
To delete a saved plan:
- In the Deployments > Resources tab, locate the entry for the scheduled plan that you want to delete.
- Click Delete next to the plan's entry.
- Click OK to confirm.