Disaster Recovery Checkpoint
Disaster Recovery is an essential part of a High Availability setup.
In a disaster recovery scenario where there has been a full Kubernetes failure, the goal is to relaunch Ververica Platform with all deployments starting from the latest known state. To enable this, Ververica Platform stores a permanent reference in the job to a Disaster Recovery Checkpoint.
Configuration
The elapsed time between Disaster Recovery Checkpoints is determined
by the checkpoint configuration you set and the Controller monitoring
frequency for Ververica Platform. The monitoring interval is 3s
.
To configure a Disaster Recovery Checkpoint, set a value n
for the
disasterCheckpointsDelay
configuration property in your Deployment
configuration YAML file. With the property enabled, every n-th
standard Flink checkpoint will be saved to the job as the updated
Disaster Recovery Checkpoint.
Additionally, set state.checkpoints.num-retained
for your
Deployment via the UI as Additional Deployment configuration to save
the last m
standard Flink checkpoints to Blob Storage. As a rule
of thumb choose a number equal to the disaster checkpoints delay value
plus 1
. The default value is 1
:
Set the disasterCheckpointsDelay
configuration property in your
Deployment configuration YAML file:
metadata:
displayName: disaster-checkpoints
spec:
deploymentTargetName: vvp-jobs
template:
spec:
artifact:
jarUri: >-
s3://vvp-snapshot-blob-storage-eu-west-1/artifacts/namespaces/default/TopSpeedWindowing.jar
kind: JAR
disasterCheckpointsDelay: 5
where:
- A value of
0
disables saving the Disaster Recovery Checkpoint. - An integer value of
> 0
specifies which Flink checkpoint will be saved in the job as the Disaster Recovery checkpoint, so that for a valuen
, everyn-th
checkpoint is saved.
Ensure that you set the required Additional Configuration for the Deployment:
state.checkpoints.num-retained
:m
, wherem
is typicallydisasterCheckpointsDelay
value+1
.
Ververica Platform checks the value during Deployment validation. If no value is set, or if the value set is too low, Ververica Platform shows a pop-up warning and suggests a value:
For more about configuring a Deployment see the Deployments documentation, which includes a full configuration example.