Skip to main content
Version: 2.13

Disaster Recovery Checkpoint

Disaster Recovery is an essential part of a High Availability setup.

In a disaster recovery scenario where there has been a full Kubernetes failure, the goal is to relaunch Ververica Platform with all deployments starting from the latest known state. To enable this, Ververica Platform stores a permanent reference in the job to a Disaster Recovery Checkpoint.

Configuration

The elapsed time between Disaster Recovery Checkpoints is determined by the checkpoint configuration you set and the Controller monitoring frequency for Ververica Platform. The monitoring interval is 3s.

To configure a Disaster Recovery Checkpoint, set a value n for the disasterCheckpointsDelay configuration property in your Deployment configuration YAML file. With the property enabled, every n-th standard Flink checkpoint will be saved to the job as the updated Disaster Recovery Checkpoint.

Additionally, set state.checkpoints.num-retained for your Deployment via the UI as Additional Deployment configuration to save the last m standard Flink checkpoints to Blob Storage. As a rule of thumb choose a number equal to the disaster checkpoints delay value plus 1. The default value is 1:

Set the Checkpoints Value

Set the disasterCheckpointsDelay configuration property in your Deployment configuration YAML file:

metadata:
displayName: disaster-checkpoints
spec:
deploymentTargetName: vvp-jobs
template:
spec:
artifact:
jarUri: >-
s3://vvp-snapshot-blob-storage-eu-west-1/artifacts/namespaces/default/TopSpeedWindowing.jar
kind: JAR
disasterCheckpointsDelay: 5

where:

  • A value of 0 disables saving the Disaster Recovery Checkpoint.
  • An integer value of > 0 specifies which Flink checkpoint will be saved in the job as the Disaster Recovery checkpoint, so that for a value n, every n-th checkpoint is saved.

Ensure that you set the required Additional Configuration for the Deployment:

  • state.checkpoints.num-retained: m, where m is typically disasterCheckpointsDelay value +1.

Ververica Platform checks the value during Deployment validation. If no value is set, or if the value set is too low, Ververica Platform shows a pop-up warning and suggests a value:

No Checkpoints Configured

For more about configuring a Deployment see the Deployments documentation, which includes a full configuration example.