Disaster Recovery Checkpoint
Disaster Recovery is an essential part of a High Availability setup.
In a disaster recovery scenario where there has been a full Kubernetes failure, the goal is to relaunch Ververica Platform with all deployments starting from the latest known state. To enable this, Ververica Platform stores a permanent reference in the job to a Disaster Recovery Checkpoint.
Configuration
The elapsed time between Disaster Recovery Checkpoints is determined
by the checkpoint configuration you set and the Controller monitoring
frequency for Ververica Platform. The monitoring interval is 3s.
To configure a Disaster Recovery Checkpoint, set a value n for the
disasterCheckpointsDelay configuration property in your Deployment
configuration YAML file. With the property enabled, every n-th
standard Flink checkpoint will be saved to the job as the updated
Disaster Recovery Checkpoint.
Additionally, set state.checkpoints.num-retained for your
Deployment via the UI as Additional Deployment configuration to save
the last m standard Flink checkpoints to Blob Storage. As a rule
of thumb choose a number equal to the disaster checkpoints delay value
plus 1. The default value is 1:

Set the disasterCheckpointsDelay configuration property in your
Deployment configuration YAML file:
metadata:
displayName: disaster-checkpoints
spec:
deploymentTargetName: vvp-jobs
template:
spec:
artifact:
jarUri: >-
s3://vvp-snapshot-blob-storage-eu-west-1/artifacts/namespaces/default/TopSpeedWindowing.jar
kind: JAR
disasterCheckpointsDelay: 5
where:
- A value of
0disables saving the Disaster Recovery Checkpoint. - An integer value of
> 0specifies which Flink checkpoint will be saved in the job as the Disaster Recovery checkpoint, so that for a valuen, everyn-thcheckpoint is saved.
Ensure that you set the required Additional Configuration for the Deployment:
state.checkpoints.num-retained:m, wheremis typicallydisasterCheckpointsDelayvalue+1.
Ververica Platform checks the value during Deployment validation. If no value is set, or if the value set is too low, Ververica Platform shows a pop-up warning and suggests a value:

For more about configuring a Deployment see the Deployments documentation, which includes a full configuration example.